Current and Recent Projects

picture of Rick's eye in a HMDThis page describes some of the research projects we are currently working on (or have recently completed) in the lab, as well as highlighting several areas of R&D being pursued. Brief descriptions and relevant background are given below, Click on the project titles to link to additional information. More on other lab research, completed projects, and collaborations can be found on our Research Interests, Publications, and Collaborators pages.

 

Indoor Navigation Research:

For most outdoor locations, a couple of button presses of the average mobile appliance will call up information about one’s current position, maps of the surrounding area, and detailed descriptions of nearby businesses. There is a glaring hole, and practical need, for similar functionality to support indoor travel. We have three funded projects in the lab looking at various aspects of learning and navigation of indoor spaces and the development of user interfaces and spatial technologies to aid this process. See our Indoor Wayfinding page for more details on the challenges posed by indoor spaces .

 

The technical goal for all of these projects is to develop a system that employs visual and multimodal interfaces and building databases with infrastructure independent sensors to provide real-time information about position, orientation, local geometry, and object identification. As it is not practical to retrofit all public buildings with active electronic sensors to support indoor positioning, the guiding philosophy of the systems we are researching is that they are inexpensive, can provide access through use of off-the-shelf hardware and software (whenever possible), and do not require building modifications. Where GPS-based systems have a large upfront cost for deploying the satellites, once up, they serve almost all outdoor spaces without the need for additional site-specific instrumentation. By contrast, indoor systems utilizing active positioning are not spatially generalizable. That is, since indoor positioning cannot rely on GPS-based systems  due to signal attenuation and lack of suitable GIS databases, alternative technology is needed which not only requires  significant upfront installation costs but also is limited to supporting navigation of that specific space only. In order to achieve general purpose indoor navigation, analogous to the availability and functionality of outdoor GPS systems, all buildings would need to be individually instrumented, representing a huge allocation of time, expense, and upkeep which is unrealistic for large-scale implementation. Thus, one of the ongoing projects in the lab investigates various techniques for avoiding these problems by making use of embedded sensors on mainstream devices such as Android-based smartphones, existing electronic building infrastructure such as WIFI, or passive positioning techniques such as RFID tags or fiducial markers which are inexpensive and readily deployed. For more on this project, see Monoj Raja’s grad research video. 


Systems Supporting Indoor Spatial Learning and Navigation Using Non-visual Displays:

Two related projects in the lab are investigating indoor navigation systems to support travel in complex buildings for people with low-vision. This work is timely, as 12 million U.S. citizens are estimated as having some form of uncorrected vision loss, of which 3.4 million are legally blind [1]. The World Health Organization (WHO) estimates that the number of people with some form of vision loss balloons to over 160 million people worldwide [2]. As these U.S. and global projections are expected to double  by the year 2030 as a result of the aging of our population, it is critical that more research is carried out on non-visual displays to support safe and  efficient spatial learning and navigation in a way that promotes independence, dignity, and wellbeing.  Both of our projects in this arena are based on empirical experiments to understand the basic information needs and best presentation modes to support spatial knowledge acquisition and spatial behaviors, as well as usability studies to guide development of accessible information displays which are designed from the onset to meet end-user requirements of this demographic.

One NSF project, entitled “Cyber Enhancement of Spatial Cognition for the Visually Impaired,” leverages expertise from human spatial cognition, machine vision, robotics, and sensor fusion algorithms (with collaborators K. Daniilidis, UPenn; S. Roumeliotis, UMN; and R. Manduchi, UCSC). Project-related work in the VEMI lab is currently addressing the information requirements for designing speech-based displays and 3D audio interfaces to be used in a “cyber assistant”. The cyber assistant is a real-time navigation system which will provide blind and low-vision people with dynamically-updated information about their position and orientation in buildings, as well as information about local geometry, identification of rooms, and indication of functional landmarks. For more information, check out the Cyber Assistant project webpage.

 

We are also working on another navigation system, sponsored by a Phase II SBIR project from NIH, with collaborators Minneapolis-based Koronis Biomedical Technologies (KBT). This project proposes an indoor solution combining multimodal interface design with new highly sensitive GPS receivers augmented with advanced dead reckoning technology. KBT is leading the engineering and technological development activities on the project; the VEMI Lab's contribution relates to experimental design and studies to determine the best ways for presenting non-visual environmental information using real-time displays which could be implemented in a commercially viable package. Read here for a brief  abstract of this Indoor GPS navigation project. 

 

In addition to their application to persons with low vision, development of non-visual indoor navigation systems are relevant more generally to situations where normal vision is impaired (e.g., for firefighters or emergency response personnel), or for use in indoor navigation systems to guide tourists (similar to the verbal instructions provided by GPS-based systems for vehicle navigation). Although the explicit goals of our current projects are scoped more narrowly, our results are also germane to these broader application domains and future research is being designed to investigate how our current findings could  be applied in these contexts. Of particular interest are applications to emergency response--more on this topic can be read on our lab Research and Development page.  

 

Evaluating Acoustic Interfaces For Presenting Environmental Information.

The results of this research have direct application to interface design for the above projects. This research looks at the efficacy of using spatialized audio, where sounds are heard as emanating from a specific direction and distance in 3D space, as an interface to be used for mobile displays that support non-visual environmental learning, spatial perception, and navigation. Research with these displays has both theoretical relevance for studying non-visual spatial cognition, as well as important applications for use in non-visual navigation interfaces, such as  accessible navigation systems for blind and low-vision persons or displays for anybody operating in low light or high cognitive load situations . The true advantage of spatialized audio is that it is a perceptual interface. That is, you don’t need to explicitly think about what the spatial information being presented means,  as the important cues about object direction and distance are intrinsic to the signal. Thus, imagine that you are wearing a blindfold and standing at the door of your bedroom. Now imagine that every object in the room was able to speak its name. Through this information, you would hear the correct 3D location in space (direction, distance, height) of these objects from your location, much like you were seeing them. By hearing dresser, bed, desk, closet etc as emanating from their actual location, you have immediate perceptual access to not only their egocentric relation with respect to your current position but also the interobject relations between these environmental elements, which is critical for developing a global understanding of the space. In a virtual acoustic display, these objects can be rendered in a simulated 3D space and by means of monitoring the user’s head movement, via their head-related transfer functions (HRTFs), can be delivered to the user in a spatially realistic manner as they turn their head. Read here to learn more about spatialized audio and virtual acoustic displays.  Now contrast this type of information display with a traditional spatial language interface, where the direction and distance of objects is given through verbal descriptions. Thus, from the same place standing at your bedroom door, you might hear: “dresser 6 feet at 2 O’clock, bed10 feet at 12 o’clock, desk 8 feet at 10 o’clock, and closet 8 feet at 8 o’clock”.  In addition to being a longer message which is slower to convey than a spatialized auditory description, the major problem with spatial language interfaces is that they require cognitive mediation to interpret the signal (i.e., you must actively interpret what is being verbally described), whereas the information from 3D spatialized displays is still useful whether or not you know clock face units, left-right directions, or distances in feet. In this example, in order to walk from the door to the desk based on the spatial language description, you must first be able to process and accurately decode the meaning of 8 feet and 10 o’clock, and then correctly orient and walk to this position in the room. Interpreting these directions can be error prone and highly variable between people depending on their internal metrics of directionality and distance. Read here for more on speech-based navigation interfaces. The advantage of spatialized audio is that no such interpretative process and psychophysical matching is necessary. Much like vision, you simply perceive these things directly from what you hear and then act on them.

Previous research has shown that virtual acoustic displays employing spatialized audio provide an intuitive non-visual interface for presenting environmental information [3,4]. However, research with these displays has generally focused on route guidance, or been done using sound sources that were placed at external positions in the environment thereby providing spatialized sound from physical speakers rather than virtual headphone displays. One disadvantage of virtual sound is that it offers fewer auditory localization cues than are available from natural sound. Being able to accurately determine direction is critical for auditory perception and localization. Thus, it is necessary to know whether the same level of performance is possible from virtual displays delivering auditory signals through headphones, as is possible from natural spatial hearing.

The purpose of one line of studies in the lab is to directly compare the accuracy of virtual acoustic displays using headphones vs. external speakers for perceiving target locations and encoding and recalling these locations from memory. If differences occur, we hope to determine the nature of the perceptual bias, so it can be compensated for in future virtual acoustic displays. Results are important for determining the efficacy of using these displays in real-time navigation systems (as is the goal of several projects in the lab). This work showed superior performance for virtual acoustic displays over external speakers for perceiving the direction of sounds and equivalent performance between modes for orienting to these target directions from memory. For more on this research, see Kit Cuddy’s grad research project video. 

 

Given our success in the previous project with virtual acoustic displays, we are currently investigating their efficacy to support more complex tasks in several new projects. One study compares virtual acoustic displays to external speakers for learning and updating of multi-object arrays. Another  project addresses spatial learning and updating of object arrays using spatialized audio from four acoustic display modes. Since an objective of this project is to use spatialized audio in portable navigation systems, we must determine the best way to present environmental information through headphones (as it is impractical to use an array of external speakers in a portable context. As discussed earlier, traditional virtual acoustic displays spatialize the auditory information based on head motion. However, for this to work, a real-time head tracker is necessary, which is also not practical for a navigation system. Thus, we are interested in other modes of conveying spatialized information. This experiment employs four auditory conditions. These include a spatial language condition, where the user is given the distance and direction of each target in a three object array, an auditory snapshot, where the stationary user hears a spatialized recording of the distance and direction of the objects panning across the array, a head-directed exploration mode where the user’s head is tracked and objects are sounded when they are faced,   and a hand-directed exploration mode where the user’s hand is tracked and objects are sounded when they are pointed to. This last condition is of particular interest, as smartphones and other portable platforms have embedded accelerometers and gyroscopes, and as such, they can easily be tracked. Thus, if the user can accurately register their head coordinates and arm coordinates, e.g. the spatial direction of the sound they hear from their ears to the orientation of their arm, this will prove a good solution for implementation of inexpensive portable virtual acoustic displays. However, if these two perspectives cannot be brought into alignment, then this suggests that a more complicated (and less optimal) technique to support spatialized audio from smartphones will need to be determined.  Once the optimal delivery method is established, research will be carried out with this interface to determine its efficacy in supporting spatial knowledge acquisition and navigation of indoor spaces (both real and virtual). For more on some of this research, se Shreyans Jain’s grad student project video.

 

Systems Supporting Indoor Spatial Learning and Navigation using Visual Displays. 

Thus far, we have only discussed non-visual interfaces but there are obvious benefits of indoor navigation systems based on visual displays to support many of the same spatial activities and behaviors (e.g., indoor route-guidance, wayfinding, tourism, resource management, emergency response operations, and provision of location-based services. To this end, we are working on an NSF project entitled “Information Integration and Human Interaction for Indoor and Outdoor Spaces” with our SIE collaborator Mike Worboys. This project involves determining computational models and data structures for representing outdoor (O) and indoor (I) spaces and the creation of a unified O/I space model. The goal is to implement this unified model on a portable, context-aware device (e.g., smartphones instrumented with appropriate sensors) which will support seamless navigation assistance in O/I spaces. Project-related work in the VEMI Lab addresses interface development and usability testing of an interactive platform for this device. One of our goals is to determine the minimum information requirements for interfaces based on visual and multimodal displays to support optimal learning and navigation performance. This is done by comparing learning from virtual displays rendered using different sensory cues with different levels of information content. For instance, how does seeing an entire floor plan of a building compare to only seeing a small “bubble” around your immediate position? Does the addition of auditory information to a visual display improve spatial knowledge acquisition and memory? Does the amount or type of environmental information needed for accurate learning and efficient navigation performance differ between indoor and outdoor spaces? Isolating the core spatial primitives that support spatial behavior make it easier to deploy an end-user optimized navigation system on different platforms, our main interest being mobile devices which have limited computational and memory resources. While our initial research is being done using VR as a testbed, our eventual goal for this project is to port the system to a mobile platform that would allow a person to walk around the building, whilst receiving dynamically-updated information about their location and heading. Spatialized descriptions of the surrounding geometric structure, including room numbers, building features, and other relevant landmarks would be available from an interactive query system. Experiments with this system, based on navigation of both virtual environments in the lab and physical buildings, will help determine the information requirements and best delivery methods of environmental cues to support spatial learning and wayfinding.  To learn more about this work, check out our  O/I space Project

webpage,  ourO/I Project Wiki, and Hengshan Li’s grad student research video. More about the issues of navigating in buildings can also be found on the  Indoor Navigation  page and from some of our  publications

 

Amodality and Functional Equivalence.

Much of the current work in this research domain is funded by an NIH project, entitled “Multimodally encoded spatial images in sighted and blind,” in collaboration with Jack Loomis (PI, UCSB) and Bobby Klatzky (Co-PI, CMU).

This line of research covers several domains but generally looks at spatial learning from different inputs and probes the nature of the spatial image, postulated as a three-dimensional representation of space which resides in working memory in the service of action. Some of this research investigates how learning from different input modalities builds up into a spatial image, generally assessed through behavioral tasks which require mental transformations of this representation through spatial updating or other spatial operations that require making judgments based on its instantiation, such as walking, pointing, or judgments of relative direction (JRDs) between learned target locations. In one related study, we investigated the ability to integrate separate object arrays from long-term memory and working memory into a common spatial image. For this task, participants learned an array of targets from a fixed origin by seeing them in a darkened room (long-term memory task). After a delay period where they performed a secondary task to ensure that the previously learned objects were not still in working memory, they learned a second array of objects from a new origin position (working memory task). They then had to imagine facing one target and point to another. Importantly, the start and end targets could be within or between long-term or working memory target arrays.     

 

Another recent study investigated the ability to build up spatial images using a long probe which extends the sense of touch out to two meters.

 

A third project is investigating the efficacy of using vibro-tactile feedback from a wireless device to learn about spatially-extended objects, e.g. those positioned outside of manipulatory (within arm’s reach) space.

For more on some of these projects, check out the Spatial images webpage and Chris Bennett’s grad student research video.


Sonification of visual images using an iPad.

This project looks at methods to provide auditory access (sonification) to information which is usually accessible solely through vision (e.g., figures, graphs, plots, etc). While screen readers, based on text-to-speech engines and algorithms to interpret the video model are great for providing speech access to text-based material, these programs are unable to convey useful information about graphically oriented information. This project investigates several methods to provide non-visual access to graphic material using both auditory and haptic display technologies for access. Our auditory display is based on experimental software developed in the lab incorporating an iPad and headphones. With this incarnation, the user runs their hand over the touch-sensitive surface of the iPad and hears an auditory signal whenever their finger touches an x-y position displaying visual information. By moving their finger along the auditory line, they are able to figure out the spatial parameters of the image. In principle, this technique could be used to recognize more complicated   pictures of any kind but our initial interest is with the ability to trace mathematical functions and accurately encode and represent this information. There are several modes to aid learnability. One varies the pitch of the auditory signal as a function of height on the image, another moves a vertical aperture over the image so the user hears  all information within that column, Others use spatialized audio, where information to the left of the display is heard from the left ear and information on the right is heard from the right ear (with information in the center heard equally by both ears).  Research with this auditory interface is being compared with two forms of information presentation through haptic exploration: hardcopy  tactile renderings of the same graphical information and exploration of dynamic haptic displays using a specialized tactile mouse outfitted with three arrays of refreshable tactile piezo-electric elements.

 

Generating spatial descriptions from visual scenes.

This multi-investigator project, entitled “Perception of Indoor Scene Layouts by Machines and Visually Impaired Users,” with K. Beard, UMaine (PI); R. Moratz, UMaine; L.J. Latecki, Temple University; and K. Daniilidis, UPenn,studies computational methods for object detection, spatial scene construction, and natural language spatial descriptions derived from real-time visual images to describe prototypical indoor spaces (e.g., rooms, offices, etc.). The primary application of this research is to provide blind or visually impaired users with spatial information about their surroundings which is critical for accurate environmental learning or spatial behaviors but which is also often difficult to obtain from non-visual sensing and mobility aids (e.g., the long cane or guide dog). An important secondary motivation for this work is to provide a mechanism for computers, search engines, and robots to access and index visually-based spatial information through the same automatically generated verbal scene descriptions that aid the blind/low-vision user. More about our over-all research agenda can be found on the Scene Description project webpage.


Current research being investigated in this domain relates to determining standardized description templates that can be used for (1) the basis of domain-specific ontologies to help with visual scene interpretation and representation in an abstract spatial model, (2) determining the simplest descriptions that can be given to users in order to support the most flexible spatial behaviors, and (3) studying whether verbal descriptions differ when derived from direct observation of the space vs. from camera images of that space taken from the same perspective. For more on this project, see Saranya Kesavan’s grad student research video.

 

3-D virtual Modeling.

Our work in this area is motivated by several open questions about the design of virtual environments (VEs). For instance, what is the critical environmental information that should be rendered in a VE to promote spatial knowledge acquisition, accurate spatial representations,  and support subsequent spatial behaviors outside of the VE? How is this information best presented to the user and what is the role of multimodal information delivery? Does the information content and display change as a function of the task? In general, our focus is on providing multimodal information sources in VEs, with a focus on finding the minimal information displays that support the greatest range of spatial behaviors.

Current work is investigating the best rendering methods and multimodal information presentation using the Panda3D development environment for conveying richly detailed, metrically accurate 3-D models of 3D environments for use in demos, navigation experiments, and contract development projects.

 

R&D projects:

We have several lines of research and development in the lab that make use of our virtual environment and augmented reality technology. What makes our research, modeling, and simulations different from traditional approaches is our appreciation from the onset of multiple channels of information processing. Thus, our notion of virtual and augmented reality is not limited to visual information, but also includes 3D auditory information, and tactile / haptic feedback (where appropriate). In addition, unlike traditional desktop 2D displays, the interactive nature of immersive VR, where the information perceived by the user is directly coupled to their movement, leads to a sense of presence (or feeling of being there) which is simply not possible from any other technology. We call our approach for the design and implementation of information displays multimodal information virtualization and augmentation (MIVA), and we are currently applying our techniques to several development projects. Check out our Lab Research and Development page to learn more about this approach and our projects.

 

MIVA 3D displays for wind energy installations:

This project makes use of immersive, multimodal 3D data virtualization techniques in the lab to create 3D viewsheds of current or potential wind farm sites. This is an important problem for site visualization or impact analyses, as it is generally extremely difficult for an observer to imagine what a proposed site would look like as a fully rendered 3D environment using traditional 2D viewshed analyses and almost impossible to imagine the sound of this installation. However, assuming we have the source data, we are able to model these parameters in VR and immerse a user in the simulation so they can experience the sight and sound characteristics of the proposed installation before physical construction. For instance, if a user wanted to hear how different wind conditions, ocean wave activity, or traffic noise effects the sound profile of the installation, a click of a button would load the appropriate sound profile and provide them with an immersive acoustic experience of these variables. Another button click could manipulate environmental parameters such as occlusion, providing the user with the experience of how ambient sound changes as a function of open or closed windows, or how their vista is effected by the presence or absence of vegetation.  

MIVA 3D displays for Architecture:

This project makes use of immersive, multimodal 3d data virtualization techniques to render buildings (or any environments) before beginning physical construction. In this way, a person/client can thus envision the structure as it will look once finished and provides them a simple way to make on-the-fly changes to the design, such as altering the room’s shape, lighting, windows, material properties, etc. Besides providing a level of immersion and conceptualization of the space which is simply not possible from 2D drawings and small-scale physical models, the ability to tweak and modify the virtual 3D model based on direct user feedback ensures that they are getting the design they want. Importantly, the use of fast, economical, and highly flexible virtual modeling techniques greatly reduces the potential for costly changes that might otherwise occur later in the design process.   

MIVA 3D displays for emergency response training and preparedness:

This project makes use of immersive, multimodal 3d data virtualization techniques to create realistic scenarios which can be used for  training purposes. For instance, simulating fires in buildings, evacuation drills, search and rescue missions, etc. In addition to being able to create whatever environment is necessary for the scenario, the use of virtual or augmented reality technology allows rehearsal and iterative learning through multiple training sessions—something which is often not possible, involves significant safety risks, or is prohibitively expensive  using training with physical resources.

 

References cited:

[1] The Eye Diseases Prevalence Research Group. (2004). Causes and prevalence of visual impairment among adults in the united states. Archives of Ophthalmology, 122, 477-485.

[2] Resnikoff, S., Pascolini, D., Etya'ale, D., Kocur, I., Pararajasegaram, R., Pokharel, G. P., et al. (2004). Global data on visual impairment in the year 2002. Bull World Health Organization, 82(11), 844-851.

[3] Loomis, J.M., Golledge, R.G., & Klatzky, R. (1998). Navigation system for the blind: Auditory display modes and guidance. Presence, 7, 193-203.

[4] Oving, A.B., Veltmann, J.A., & Bronkhorst, A. (2004). Effectiveness of 3- audio for warnings in the cockpit. Int. J. of Aviation Psychology, 14, 257-276.