Person re-identification consists of recognizing individuals across different sensors of a camera network. Whereas clothing appearance cues are widely used, other modalities could be exploited as additional information sources, like anthropometric measures and gait. In this work we investigate whether the re-identification accuracy of clothing appearance descriptors can be improved by fusing them with anthropometric measures extracted from depth data, using RGB-D sensors, in unconstrained settings. We also propose a dissimilarity-based framework for building and fusing multi-modal descriptors of pedestrian images for re-identification tasks, as an alternative to the widely used score-level fusion. The experimental evaluation is carried out on two data sets including RGB-D data, one of which is a novel, publicly available data set that we acquired using Kinect sensors. The fusion with anthropometric measures increases the first-rank recognition rate of clothing appearance descriptors up to 20%, whereas our fusion approach reduces the processing cost of the matching phase.