Caricature recognition is a novel, interesting, yet challenging problem. Due to the exaggeration and distortion, there is a large cross-modal gap between photographs and caricatures, making it nontrivial to match the features of photographs and caricatures. To address the problem, a joint local and global metric learning method (LGDML) is proposed. First, joint local and global feature representation is learnt with convolutional neural networks to find both discriminant features of local facial parts and global distinctive features of the whole face. Next, in order to fuse the local and global similarities of features, a unified feature representation and similarity measure learning framework is proposed. Various methods are evaluated on the caricature recognition task. We have verified that both local and global features are crucial for caricature recognition. Moreover, experimental results show that, compared with the state-of-the-art methods, LGDML can obtain superior performance in terms of Rank-1 and Rank-10.