The amount of transposition required to convert users’ physical actions into digital form has ebbed and flowed over the years.
Command line interfaces were direct: pressing a single key produced the corresponding character on-screen.
The introduction of the mouse resulted in a more abstract relationship: small hand movements and finger clicks, performed some distance from the screen, were translated into graphical sequences.
Currently the growth of touch is making interaction tangible again: users place their finger on what they want and get an instant response.
Gestural interfaces are best understood within this continuum of interaction development. They are more abstract, placing the user further from the content, and reliant on movements derived from metaphors rather than direct manipulation.
Liat Rostock, Marketing Director of EyeSight, demonstrated what’s realistic within the limitations of today’s mass market mobile devices. EyeSight have built their technology to work within the confines of limited processing power, battery life and camera capabilities, such that it can recognise arms-length gestures from the VGA front facing cameras common on mid- and high-end smartphones. At ‘living room’ distance – perhaps a few metres – a 1.3 megapixel camera is required. Both work best at a frame rate above 30 per second, but can function at 24.
EyeSight have also started to develop a basic gestural vocabulary. A left-right-left wave of the hand activates the input system to track a specific user. Bringing your hands together stops the camera from tracking you.
In testing, it is immediately apparent that the optimal vocabulary changes depending on the user context. A user holding a mobile device in one hand will require a different set of gestures to a user sitting in front of the home TV with both hands available. Rostock also stressed the need to reflect local cultural considerations, citing the example of a closed fist, which is inappropriate in a number of Asian countries.
(We looked at this topic as part of the Pathway #12 (‘Apply knowledge of brain processes for more effective mobile experiences’) working sessions at the last MEX event).
There are further challenges. This kind of camera-based tracking requires a certain amount of light, although Rostock claimed the glow from a screen is sufficient for arms-length interactions, while the illumination of a table lamp is enough in a room scenario.
User behaviour varies, particularly the speed and accuracy with which individuals perform gestures. Children tend to be fast and expansive, while the elderly are slower and more restrained.
Over time, I suspect a layer of software intelligence similar to the engines used for predictive text will improve recognition. EyeSight is already using video analysis, both manual and automated, to better understand how the system can be tailored to individuals.
Cameras are also inherently power hungry. EyeSight are conscious of this and are working with CEVA to pre-integrate into their DSP chip to minimise processing power requirements, but the action of image sensing itself will always be relatively power intensive. There is, perhaps, a role for combining with other sensors here, using proximity or sound levels to determine when a user isn’t present and switching the camera into sleep mode during these periods.
In practice, EyeSight’s demonstration was impressive. It recognised swipes to silence and answer calls. Both Rostock and I could control the same device through its ability to track multiple people. Most impressively, it was able to recognise one or more finger tips at a distance of a couple of metres, allowing reasonable accuracy for detailed on-screen navigation. I don’t imagine this kind of mouse pointer-style interface as the most useable application for the technology, but as a technical feat the ability to recognise finger tips opens up many possibilities.
There were glitches. The demonstration systems needed to be reset a couple of times and there were moments when the recognition performed poorly. However, these tests were being performed on older hardware in an exhibition booth filled with background movement.
Performance aside, there is also the question of whether gestural interfaces represent a good user experience on mobile devices. Firstly, consider whether they are ergonomically sound: will it caused discomfort to perform in-air gestures with one hand while holding device with the other? Second, think about the alternatives: what are the scenarios where users couldn’t or wouldn’t want to interact with touchscreen or physical keys?
I remain to be convinced on the ergonomics question, but the second consideration – demand – has a more obvious answer.
Gestural interfaces are well suited to partial attention or partial capability environments where some of the user’s cognitive or physical capacity is otherwise engaged. The in-car environment, sporting activities or simply the growing number of users who interact simultaneously with several digital touchpoints in the home. In all of these instances, gestures – which generally have a lower cognitive loading than visually-reliant touch interactions – may allow users to multi-task more easily.
Not to mention, of course, the large number of users who for reasons of climate or occupation find themselves wearing gloves incompatible with capacitive touchscreens or too inaccurate for small buttons.
Gestures will be most effective, however, when combined with appropriate audio or haptic feedback. One of the biggest advantages of gestural interfaces in the mobile environment is the ability to perform actions without the visual precision required by touch. However, users naturally expect confirmation of an action and, without an audible or tactile response to a gesture, their instinct will be to seek a visual confirmation. This would negate the multi-tasking benefits of gestures.
Interaction designers, therefore, need to rethink traditional metaphors when optimising for gestural input. The ticks, ‘OK’ buttons, colour changes and highlights which work when you have the user’s full visual attention are simply not appropriate to gestural interactions. Perhaps in their place we will see light flashes, more sophisticated audio confirmations or haptic vibrations.
These issues link with MEX Pathways #9 (Expand mobile interactions with the neglected dimensions of sound and tactility) and #12 (Apply knowledge of brain processes for more effective mobile experiences).