Google’s wandering, inevitable path to ambient augmentation

Google’s wandering, inevitable path to ambient augmentation

Would you like to explore a tangent? It concerns chickens, antiques and the future of ambient augmentation.

About twice a week I like to buy eggs from a local farm. It is a walk of a few miles, along a country road at first and then across fields. I enjoy it for the exercise and because I find a good walk often prompts new thinking.

Google knows all about this. There is a screen on my Android phone which can tell me exactly where I have been, how long for and whether I walked, cycled or drove my car.

Google also already knows the answers to all the random thoughts prompted by the visual stimuli I encountered on this walk.

Visual stimuli on my computer vision ramble

However, there’s a problem. While Google knows the answers, it doesn’t know the questions – yet:

  • What am I doing on 14th and 15th April?
  • Where could I find out more information about the Peterborough Antiques Fair?
  • Do I need to book a ticket?
  • Why does gorse smell like coconut?
  • Is the blossom earlier this year than last?
  • What kind of chickens are those?
  • Will those mushrooms kill me?

In the moment the questions materialised I was more interested in my walk. I was conscious of the visual stimuli flooding into my eyes – the sign advertising the antiques fair, the spring blossom opening in the hedgerows, the chickens pecking in the farm yard – but my desire for answers was far exceeded by wanting to be in the present, enjoying the experience of the walk.

There’s a limit to how many questions a user can ask explicitly in a day and, currently, Google needs that kind of specific input to know about the infinite curiosity of the world’s citizens. The two trillion searches Google processes every year may sound like a huge number, but it is not even the tip of the iceberg. For every question I type into my computer or speak into my phone, there are countless others I never will.

At some point, I am sure I will want the answers to all of those random, weird, highly personalised questions I thought about on that walk. I am also sure Google would like to answer them: its business is built on that premise.

How might a good digital experience complete the loop connecting this insatiable curiosity of humans with the infinite knowledge of the web?

Consider the limitations in the moment when the visual stimuli became questions:

  • I did not want to be interrupted.
  • I wanted to know more, but I didn’t know when, or how.
  • The only digital tool I was carrying (a smartphone) was designed for person-to-person communications rather than to interpret the world around me.

Now, consider the opportunities:

  • Much of my behaviour is already encoded in digital form (Google knows my movements, search history, calendar, photos and more).
  • Its search engine already knows the answers to the questions I had.
  • Machine learning techniques can already accurately interpret the visual stimuli I encountered and turn them into usable data.

We’re already close – very close – to closing the loop, but there are two missing links:

  1. A socially acceptable form-factor for a universal computer vision sensor. The next phase requires digital to see the user’s world in real-time and that will only happen once a novel form-factor is found. Smartphones and Glass-style headsets aren’t it.
  2. A digital exploration interface which subscribes to principles of quiet design. Answers should remain hidden until they’re summoned, awaiting the user’s command and never presuming to interrupt.

We know this storm is coming. We’re already seeing the first breeze stirring the trees with augmented reality products like Microsoft HoloLens, Google Glass and Pinterest Lens. Individually they are rightly regarded as niche experiments, but collectively they’re waypoints on a path to the next generation of digital experiences.

If you think back to my questions and imagine that Google could have seen and encoded my visual world in real-time, the technology already exists to answer them all:

  • Once recognised, the sign advertising the antiques fair in Peterborough on 14th and 15th April could be used to search my calendar availability, find the relevant web-site and offer me a simple confirmation to book a ticket.
  • Google’s image algorithms already recognise the distinctive bright yellow gorse flowers and could link me to the Wikipedia page which explains their coconut smell.
  • I take photos of spring blossom every year. My Google Photos account already contains the automatically geo-tagged, time-coded photos. Given that Google’s image algorithms can recognise blossom and that it has access to millions of other accounts which might contain corroborating images, answering the question ‘did photos of X appear earlier this year or last?’ is relatively straightforward.
  • The mushrooms, of course, are more challenging given the potential consequences, but the same principles apply to surfacing relevant information. (I didn’t risk it, in case you were wondering…and the elaborate chickens were actually guinea fowl!)

The greater challenge is knowing when and how to surface these results. Every object Google recognises in my world could lead to myriad different answers. Google is already using machine learning at vast scale to become very good at guessing what a given user might be looking for: think about how often it suggests the right query from just the first few letters you entered into the search bar.

However, when a user initiates a search through keyboard or voice, they are signalling implicit confirmation that they’re ready to receive the result. In contrast, the answers Google could surface by visually decoding the world around its users might be wanted immediately, in a few seconds or several weeks later. Get this wrong and you risk a degree of cognitive overload which would fatally flaw the experience:

  • I might want near real-time information about the mushrooms. Unless they’re definitely edible, I won’t pick them. The answer could be presented in my field of vision through augmented reality or be present on the lockscreen of my phone when I pull from it my pocket to check.
  • The question about the blossom might be something I returned to later that day. Maybe it could be surfaced when I was writing a diary entry in the evening?
  • The information about the antiques fair, however, might not become relevant for several days or weeks. As the event approached, a sidebar in my Google Calendar on my PC could suggest it, along with the link to book.

We might conceptualise the lifecycle of these answers as an orbital system. The user has the ability to pull answers into their gravity when needed, but otherwise the answers remain at a distance, waiting to be called. Some may gradually move closer of their own accord in response to changing contextual factors like time, social relationships and environment. As they do so, they may emit ambient signals at the edge of the user’s digital sub-consciousness – not so loudly as to interrupt the conscious present, but quietly and patiently.

The experience design challenges highlighted by this scenario leave me pondering several thoughts:

  1. While I used Google as my example, this could be any current or aspiring tech giant. Google has some scale advantages in refining machine learning and access to existing user data, but it could be undermined by a new entrant better able to reassure users about their privacy concerns.
  2. Creating an experience like the one described will require a multi-touchpoint and multi-sensory design approach from the outset. These are the user-centred design skills of the future.
  3. The biggest conceptual design challenge as we move towards these augmented and mixed reality experiences is how to give access to vastly more data while reducing cognitive load on the user. Principles of quiet and ambient design will be essential.
  4. Hardware still matters. Several new device types will be required, such as the universal computer vision sensor I mentioned and a way of getting a digital display direct to the eye. It will take incredible industrial design skills to make these simultaneously desirable, affordable and functional.

That was the tangent I explored on my walk. What do you think will come next?

+ There are no comments

Add yours