Archive for the ‘AI’ Category

Superhuman powers by Computer Vision

I’ll be attending a workshop called “Science and Fiction” in two weeks. It’s going to be awesome because I’m a big fan of SciFi since I was a little kid. I’ve volunteered to give a talk on the science fiction kind of  applications of computer vision. A mayor field of applications is of course robotics. Robots that can actually see and do tasks visually guided will be a reality in a couple of dozen years (at least in my opinion). That is going to be cool and an will exciting application of computer vision but in light of this workshop I am thinking more about the ways in which computer vision can give normal people (what we would call now) superhuman powers. So, let’s say computer vision (classification, detection, tracking, ..) is solved and you carry around a sufficiently good camera and computer, then these are the superpowers you could posses as far as I came up with so far:

  1. Superhuman memory: The system will record everything you see and can e.g. tell you “Where do I know this guy from?”, “Where did I leave my keys?”, “How many people were at that party last night” and so on.
  2. Superhuman perception: Think Dustin Hoffman in Rainman. Stuff like “counting the number of toothpicks on the floor” will be done instantaneously by your vision system. We can even go further and the system can have a good guess what age/height/weight/.. everyone around you is. Even more freaky is the idea that mind reading or at least lie detection could be possible. A camera that captures facial expressions, body language and the heat coming of the face of your opposite (given that we are using a camera that can capture IR) could possibly be used to do this.
  3. Superhuman control over machines: Right now Human Machine Interaction (HMI) is very limited. But with a good vision system and a good machine learning algorithm you could train a system which will allow you to teleoperata any accessible system in your vicinity with very little effort. My vision would be tell the coffee maker to do it’s work or turning the mannequins in a shop window by a simple pointing gesture (both of which are theoretically possible today but I’m thinking of a situation where HMI of this level is a natural part of everyday life).
  4. Superhuman instant knowledge about the surroundings: Augmented reality in an unconstrained world would allow a user to be presented instantaneously with relevant information about the thing he is looking at. For example when you are looking at the Mona Lisa in the Louvre the vision system would detect and register it and present you additional information in an HUD. Or, when you are fixing your car the system could insert pictorial information about the next steps in your field of view.

All of those things would be really cool but I am still not convinced that this is all the future has to offer. Perhaps some further brainstorming will unveil some more.

Computer vs. human vision

VisionI just read this very nice paper by Shimon Edleman entitled “Object Categorization: Computer and Human Vision Perspectives”, a paper that deserves all the capital letters in the title :-)

Shimon Edelman: Object Categorization: Computer and Human Vision Perspectives

I’ve been playing around with the idea that computer vision and human vision are two really different things for quiet a while now and this paper argues exactly in this direction. There are several key differences between human and computer vision these days: Human vision has evolved as a means to drive our actions in a purposeful way while computer vision is always just designed to mimic some aspects of our visual system such as object recognition. Secondly, our human visual system is trained on a large number of images which are presented as a continuous stream while computer vision systems are mostly trained using a small number of images. The usual challenge for the Caltech 101 object recognition database is to learn each category from 5 examples. Humans can solve the Caltech 101 perfectly but this is completely different than the algorithmic challenge. I don’t think we could solve this task if all we’d ever seen in our life were 5 pictures. A third mayor difference between human and computer vision is that we are mobile and can interact with the world. For us vision serves an actual crucial task. We don’t learn to tell the difference between a chair and a table by carefully analyzing five pictures from each class. We can tell what a chair is by knowing what an object needs to allow us to sit on it. A computer vision algorithm can’t interact with the world (a robot could but they usually can’t learn but are programmed for tasks) and therefore the difference between a chair and a table is meaningless for him.

When designing a computer vision system we have to decide what we aim for. If we just need a technical solution for a clearly defined problem such as face detection or object tracking the current approaches are probably on the right track. But if we want to build a system that can see just like we humans see we will have to rethink the whole approach. We have probably been fooled by our system because it works so nicely and effortlessly. But, what the neural system does in the early stages and what we think “seeing” means are two different things. What we perceive of the visual world is far more complex than the photons that hit our eyes. Rebuilding these capabilities might not be possible since it took evolution quit a long time to get to this point and vision is just a part of the human perception of the world. This however, opens up the question what would a perceptual system look like if it would evolve in the web. What kind of affordances does an environment that consists of texts and still images in a fairly unstructured way provide? I think this is a very exiting question and I will have to think about that.

Follow

Get every new post delivered to your Inbox.