Since cameras were first invented centuries ago, all have required a lens to capture a focused image. That may soon end, thanks to research on advanced machine learning for a new generation of image sensing by the Tokyo Institute of Technology.
In recent times, we have seen an explosion of smaller, lighter and cheaper cameras — many included in our smartphones. In the next-generation of cameras, designers want them to be compact enough to be installed anywhere. In the past, that miniaturization was always restricted by the need for a lens on the camera.
That is about to change. Recent advances in computing technology allow the entire lens to be abandoned thanks to the use of image reconstruction. This will lead to the creation of the first high-quality, lens-less camera. It will be ultra-thin, light in weight and low in cost.
The new approach to camera design uses an array of “Vision Transformers.” With this technology, global reasoning can be applied across the entire image sensor in order to identify and analyze the light as it strikes the sensor.
Researchers at the Tokyo Institute of Technology have developed a new image reconstruction method that improves computation time and provides high-quality images. Prof. Masahiro Yamaguchi, a member of the research team at Tokyo Tech, said “without the limitations of a lens, the lens-less camera could be ultra-miniature, which could allow new applications that are beyond our imagination.”
The new technology uses Vision Transformer (ViT), a machine learning technique. ViT is better at global feature reasoning due to its novel structure of multistage transformer blocks with overlapped “patchify” modules.
This allows ViT to efficiently learn image features in a hierarchical representation, making it able to address the multiplexing property and avoid the limitations of conventional deep learning, known as convolutional neural network (CNN).
The typical optical hardware of the lens-less camera consists of a thin mask and an image sensor. The image is then reconstructed using a mathematical algorithm. The mask and the sensor can be fabricated together in established semiconductor manufacturing processes for future production.
The mask optically encodes the incident light and casts patterns on the sensor. Though the casted patterns are completely non-interpretable to the human eye, they can be decoded by the optical system.
The influence of model approximation errors using ViT is dramatically reduced because the machine learning system interprets the physical model. The proposed ViT-based method also uses global features in the image and is suitable for processing casted patterns over a wide area on the image sensor.
The research team performed optical experiments which suggest that the lens-less camera with the proposed reconstruction method can produce high-quality and visually appealing images while the speed of post-processing computation is high enough for real-time capture.
Lens-less cameras operate like a conventional camera, and the research suggests that with additional development, higher-quality images can be produced with greater sharpness and detail.
If a lens-less camera doesn’t have to obey the rules of physics when it comes to light bending and the distance required to make a focused image, then there could be no limit as to how small a camera can become. In fact, the camera could become invisible.
“We realize that miniaturization should not be the only advantage of the lens-less camera,” said Xiuxi Pan of Tokyo Tech. “The lens-less camera can be applied to invisible light imaging, in which the use of a lens is impractical or even impossible. In addition, the underlying dimensionality of captured optical information by the lens-less camera is greater than two, which makes one-shot 3D imaging and post-capture refocusing possible.”