Locale is one of the basic elements of place, referring to the physical settings and visual appearance of a place. Understanding and representing a locale is of great importance in terms of human perception and human activity. However, taking a quantitative measurement of the visual appearance of urban environment has proven to be challenging because visual information is inherently ambiguous and semantically impoverished. To mitigate this issue, this paper employs street-level images as the proxy for urban physical appearance, utilizes the recently developed image semantic segmentation techniques to parse an urban scene into scene elements, and proposes a framework for locale representation using scene elements. The framework is composed of two major components: street scene ontology and street visual descriptor, which are aimed at street scene qualitative understanding and quantitative representation respectively. A case study is developed to demonstrate the application and advantage of the street scene ontology and street visual descriptor. A series of quantitative analyses demonstrates the ability and great potential of the framework for investigating the connections between place and other socioeconomic factors.