Urban land use mapping is crucial for effective urban management and planning due to the rapid change of urban processes. State-of-the-art approaches rely heavily on the socioeconomic, topographical, infrastructural and land cover information of urban environments via feeding them into ad hoc classifiers for land use classification. Yet, the major challenge lies in the lack of a universal and reliable approach for the extraction and combination of physical and socioeconomic features derived from remote sensing imagery and social sensing data. This article proposes an ensemble-learning-approach-based solution of integrating a rich body of features derived from high resolution satellite images, street-view images, building footprints, points-of-interest (POIs) and social media check-ins for the urban land use mapping task. The proposed approach can statistically differentiate the importance of input feature variables and provides a good explanation for the relationships between land cover, socioeconomic activities and land use categories. We apply the proposed method to infer the land use distribution in fine-grained spatial granularity within the Fifth Ring Road of Beijing and achieve an average classification accuracy of 74.2% over nine typical land use types. The results also indicate that our model outperforms several alternative models that have been widely utilized as baselines for land use classification.