To leverage the correlated information between modalities to benefit the cross-modal segmentation, we propose a novel cross-modal attention-guided convolutional network for multi-modal cardiac segmentation. In particular, we first employed the cycle-consistency generative adversarial networks to complete the bidirectional image generation (i.e., MR to CT, CT to MR) to help reduce the modal-level inconsistency. Then, with the generated and original MR and CT images, a novel convolutional network is proposed where (1) two encoders learn individual features separately and (2) a common decoder learns shareable features between modalities for a final consistent segmentation. Also, we propose a cross-modal attention module between the encoders and decoder in order to leverage the correlated information between modalities. Our model can be trained in an end-to-end manner. With extensive evaluation on the unpaired CT and MR cardiac images, our method outperforms the baselines in terms of the segmentation performance.