Person retrieval faces many challenges including cluttered background, appearance variations (e.g., illumination, pose, occlusion) among different camera views and the similarity among different person's images. To address these issues, we put forward a novel mask based deep ranking neural network with a skipped fusing layer. Firstly, to alleviate the problem of cluttered background, masked images with only the foreground regions are incorporated as input in the proposed neural network. Secondly, to reduce the impact of the appearance variations, the multi-layer fusion scheme is developed to obtain more discriminative fine-grained information. Lastly, considering person retrieval is a special image retrieval task, we propose a novel ranking loss to optimize the whole network. The proposed ranking loss can further mitigate the interference problem of similar negative samples when producing ranking results. The extensive experiments validate the superiority of the proposed method compared with the state-of-the-art methods on many benchmark datasets.