The short answer is: not much.
Well. Maybe we first have to talk about what it means to “see”. Vision is an extremely rich natural phenomenon. Most of us humans have the uncanny ability to turn light into meaning – as do many other species in the animal kingdom. Vision is mainly used for navigation and recognition. We use our eyes to detect objects in our environment and use the shapes and layout of these objects to navigate our way through life.
We also learn to recognise objects, spaces, and events. We can find the things we can eat to survive, approach our friends, and avoid our foes, and recognise the spaces that are hospitable or forbidding. We can even distinguish different dynamic coincidences of objects and spaces that we call events. Seeing is an important way to mentally represent this world in which we want to reach our goals of survival and procreation, stay out of trouble, and have some fun along the way.
So let me focus on my favourite lens through which to look at visual perception and cognition. (Notice the abundance of visual metaphors in our language.) A prerequisite for dealing with the world is to incorporate the fact that we live in a three-dimensional (3D) world that is made of 3D objects in 3D spaces. So, our visual system needs to represent our world in 3D. The problem is that the optical projection in our eyes has a 2D structure in which the third dimension is lost; you can only detect from which direction a light ray is coming in, not the distance to the place where it originated. Suffice to say that we humans (and a plethora of other animals) have evolved to embody a solution to this problem. (No, stereo vision with two eyes is not the generic solution.) The question today is whether the iPhone has this capability as well.
You already know that the answer is NO, but I want to provide some more detail. What about the fact that an iPhone has a very decent camera? Even two cameras. The problem is that these cameras are not used to “see” in the sense we discussed. They are not used to navigate and recognise, but are used to make artefacts that we can look at. Those things are called photos and videos. So a lot of hardware and software in an iPhone is devoted to optimising our experience of looking at a photo or watching a video on a 2D display. The optical input is not used to create a 3D representation of the environment. And the cameras are definitely not used to increase the iPhone’s survival.
If you study the frameworks (software libraries) of the iOS 8 operating system, none of them contain any 3D vision algorithms. There are in fact 2D image processing algorithms that are useful and interesting. The Core Image framework contains a CIDetector class that allows for detection of faces, rectangles, and QR codes (2D bar codes). The face detector can indicate whether there are faces in the image. It can also identify the position of the eyes and the mouth, and even detect whether there is a smile, or whether the eyes are open or closed. The rectangle detector gives the image coordinates of the four points that make up the quadrilateral that is formed by the projection of a rectangle. Finally, the QR code detector also gives the four corner points of the projected square, but on top of that gives the decoded message that was hidden in the pattern.
So, today’s iPhone can not see in the general sense of the term, though it has some visual capabilities in a limited sense. Those functionalities focus on 2D aspects in the image and do not tell us anything about the 3D structure of the scene that is photographed. Yet.
> You can sign up for my newsletter here.