Architects and urban planners are required to understand residents impression of present and future space design and to obtain their consensus concerning the space design. Virtual and augmented reality systems have been applied to show 3d external space model to residents and to create walk-through video. Virtual reality systems construct blocks of shopping areas and augmented reality systems compose a resident’s house and buildings with a residential area. So, the resident can experience and evaluate planned environments before the development is executed.Virtual and augmented reality technology focuses on the presentation of 3d space and image processing on description and measurements of 3d space in images. These applications cannot promote resident’s consensus concerning space designs because they are just tools to build and present 3d space models. The resident’s subjective interpretation of the design is not reflected in such a system. On the other hand, architecture and town planning researchers model subjective interpretation for space designs using simplified physical measurements such as object and space width and height. Space design is regarded not as composite information such as color, texture and form but as physical environments human beings recognize. Environmental psychology deals with conceptual models of subjective interpretation based not on physical measurements but, on concepts such as complexity and coherence, so it is difficult to construct a model on computers. Our purpose is to model subjective interpretation of urban landscape based on 2d and 3d descriptions. Once a model is established, subjective interpretation of unknown information such as 2d image and 3d space data can be predicted. A process to establish the model is required to discriminate subjective interpretation of our environments and to find 2d and 3d descriptors that cause the discrimination of the subjective interpretation. Statistical analysis is applied to obtain the relationship between subjective responses and still-image and 3d space model of urban landscape. This paper shows a method to predict interpreted image-words of urban landscape and to retrieve 3d buildings harmonized to the urban landscape from a building database based on resident’s interpretation when a given urban landscape picture and video is selected. We set up a pilot study to predict interpreted words and find 3D buildings from the database in a street image of Minato Mirai 21 (Yokohama, Japan).