Many modern face verification algorithms use a small set of reference templates to save memory and computational resources. However, both the reference templates and the combination of the corresponding matching scores are heuristically chosen. In this paper, we propose a well-principled approach, named sparse support faces, that can outperform state-of-the-art methods both in terms of recognition accuracy and number of required face templates, by jointly learning an optimal combination of matching scores and the corresponding subset of face templates. For each client, our method learns a support vector machine using the given matching algorithm as the kernel function, and determines a set of reference templates, that we call support faces, corresponding to its support vectors. It then drastically reduces the number of templates, without affecting recognition accuracy, by learning a set of virtual faces as well-principled transformations of the initial support faces. The use of a very small set of support face templates makes the decisions of our approach also easily interpretable for designers and end users of the face verification system.