Learning an optimal kernel plays a pivotal role in kernel-based methods. Recently, an approach called optimal neighborhood kernel learning (ONKL) has been proposed, showing promising classification performance. It assumes that the optimal kernel will reside in the neighborhood of a "pre-specified" kernel. Nevertheless, how to specify such a kernel in a principled way remains unclear. To solve this issue, this paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of support vector machines. To avoid trivial solutions, we constrain the pre-specified kernel with a parameterized model. We first discuss the characteristics of our approach and in particular high-light its adaptivity. After that, two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian radial basis function kernel and a linear combination of a set of base kernels in the way of multiple kernel learning (MKL), respectively. We show that the optimization in our approach is a min-max problem and can be efficiently solved by employing the extended level method and Nesterov's method. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on 13 UCI data sets and another two real-world data sets show that via the joint learning process, our approach not only adaptively identifies the pre-specified kernel, but also achieves superior classification performance to the original ONKL and the related MKL algorithms.