As a second-order pooled representation, covariance matrix has attracted much attention in visual recognition, and some pioneering works have recently integrated it into deep learning. A recent study shows that kernel matrix works considerably better than covariance matrix for this kind of representation, by modeling the higher-order, nonlinear relationship among pooled visual descriptors. Nevertheless, in that study neither the descriptors nor the kernel matrix is deeply learned. Worse, they are considered separately, hindering the pursuit of an optimal representation. To improve this situation, this work designs a deep network that jointly learns local descriptors and kernel-matrix-based pooled representation in an end-to-end manner. The derivatives for the mapping from a local descriptor set to this representation are derived to carry out backpropagation. More importantly, we introduce the Daleckiǐ-Kreǐn formula from Operator theory to give a concise and unified result on differentiating general functions defined on symmetric positive-definite (SPD) matrix, which shows its better numerical stability in conducting backpropagation compared with the existing method when handling the Riemannian geometry of SPD matrix. Experiments on fine-grained image benchmark datasets not only show the superiority of kernel-matrix-based SPD representation with deep local descriptors, but also verify the advantage of the proposed deep network in pursuing better SPD representations. Also, ablation study is provided to explain why and from where these improvements are attained.