Compact and discriminative visual codebooks are preferred in many visual recognition tasks. In the literature, a number of works have taken the approach of hierarchically merging visual words of an initial large-sized codebook, but implemented this approach with different merging criteria. In this work, we propose a single probabilistic framework to unify these merging criteria, by identifying two key factors: the function used to model the class-conditional distribution and the method used to estimate the distribution parameters. More importantly, by adopting new distribution functions and/or parameter estimation methods, our framework can readily produce a spectrum of novel merging criteria. Three of them are specifically discussed in this paper. For the first criterion, we adopt the multinomial distribution with the Bayesian method; For the second criterion, we integrate the Gaussian distribution with maximum likelihood parameter estimation. For the third criterion, which shows the best merging performance, we propose a max-margin-based parameter estimation method and apply it with the multinomial distribution. Extensive experimental study is conducted to systematically analyze the performance of the above three criteria and compare them with existing ones. As demonstrated, the best criterion within our framework achieves the overall best merging performance among the compared merging criteria developed in the literature.