The main problem I have with neural networks for computer vision is that they do not give me understanding. Even the best network, that has a 99.91% accuracy on the MNIST handwritten digits dataset, can not give me any insight. It does not allow me to observe how it actually performs its classification.
I have been watching Norman Wildberger’s videos on all things mathematics for about 10 years. To say that I have learned a lot is the understatement of the decade.
His most recent video is a recorded talk from July this year titled “How Chromogeometry transcends Klein’s Erlangen Program for Planar Geometries”. It is fascinating throughout but my interest was piqued when at 25:13 he starts talking about ellipses.
I am building systems that can understand what they see. In this day and age, the necessary hardware is easily accessible since a digital camera and a computer can now be purchased for well under € 100. It is the software that is the real challenge.
A major assumption in modern computer vision is that you have to track points on surfaces in order to see in 3D. You can use 2 images from 2 static cameras (“stereo”), or 2 images from 1 moving camera (“motion”).Continue reading “Solving correspondence”
One pleasant surprise for computer vision on a mobile device is that we can detect the 3D orientation of the camera from other sensors. An iPhone has an accelerometer and a gyroscope (among complementary sensors not discussed here).Continue reading “Internal inertial sensors”
Our imagination is a powerful cognitive skill. When I walked into the living room of my new apartment, I experienced a rectangular empty space with a dusty concrete floor and hollow sounding acoustics. But in my mind I was already furnishing and decorating. I imagined a blue carpet on the floor, the walls lined with bookcases, a large table on the far end, and a comfortable couch near the window.Continue reading “Virtual furniture at the right scale”
I am right-eyed. A phenomenon also called “master eye” or “ocular dominance”.
So my left eye is the lazy one. And I don’t think it was properly treated when I was growing up. What I remember is that I saw double while reading: two images floating on top of each other. I could not fuse the two images into one “percept” of the book in front of me. I could accomplish that fusion easily for normal objects that were further away than the words on a page at arm’s length.