Street-level imagery has covered the comprehensive landscape of urban areas. Compared to satellite imagery, this new source of image data has the advantage in fine-grained observations of not only physical environment but also social sensing. Prior studies using street-level imagery focus primarily on urban physical environment auditing. In this study, we demonstrate the potential usage of street-level imagery in uncovering spatio-temporal urban mobility patterns. Our method assumes that the streetscape depicted in street-level imagery reflects urban functions and that urban streets of similar functions exhibit similar temporal mobility patterns. We present how a deep convolutional neural network (DCNN) can be trained to identify high-level scene features from street view images that can explain up to 66.5% of the hourly variation of taxi trips along with the urban road network. The study shows that street-level imagery, as the counterpart of remote sensing imagery, provides an opportunity to infer fine-scale human activity information of an urban region and bridge gaps between the physical space and human space. This approach can therefore facilitate urban environment observation and smart urban planning.