I use RGB plus IR cameras for depth perception and low-light operation, ensuring effective navigation and interaction with users.
Yeah. Yeah. No. That's a good question. I think even taking a step back, I will address, like, why do we even lose cameras in the first place?
So generally, you might have heard this before, which is, like, our philosophy here is we want to put robots in people's homes. And people's spaces are very, very unique and quite complex. And just taking cues from nature, humans are able to figure out their way in any space mostly because they use their eyes. And so vision is a very, very powerful sensor. It gives you the most information about the environment in a way that, you know, for example, a LIDAR sensor or a sonar or a radar or even an ultrasonic sensor or a time of flight sensor.
There's all these different kinds of sensors out there. There's just not that much information coming back from the sensor. The second thing there is cameras because they ended up in smartphones. You know, thank you, Steve Jobs. And we just have built so much supply chain in industry around cameras.
So cameras are very widely available and much cheaper as compared to a lot of the other sensor options that you have out there. And then, of course, self driving also came around and so people started developing even more options in cameras. And so I think from that perspective, it's always been a high level technical sort of like decision to make sure that we only use cameras for the robot. Now within cameras, there's a few different ways we could have gone. I think initially, we started out with just RGB cameras.
So you have your red, green, and blue pixel, and you get that information. And that's important for us to have the RGB because if you don't have the RGB, even if the robot was somehow able to understand perfect depth, for example, you can do that with a time of flight sensor. The RGB is important because then we can build a user facing map and that map is essentially the most important communication tool between the robot's understanding of the world and what the customer is seeing on their end. So RGB was sort of like necessary for that purpose. And then slowly we realized just like humans, an RGB camera has the limitation that, you know, at at nighttime, you just don't have enough light coming through.
You're not gonna be able to operate in the dark. And so we realized we needed to add IR illumination to the robot and IR sort of sensitivity on the sensor as well. So I'll point to the robot a little bit. Hopefully this is visible. The cameras are up here.
So this is the front stereo pair. And in the middle, you see a slightly off color panel. What that is, it's IR transparent plastic. And right behind that, we have IR LEDs that are illuminating the scene at nighttime, for example, also in low light conditions. It's the same thing in the back of the robot.
It's essentially the same design mimicked or sort of mirrored. And then on the top where you have your sort of up camera, as we call it, you have a corresponding IR panel on the other side. So these IR lights, as you can imagine, sort of illuminate the scene in any kind of low light and nighttime situation. So very early on, I think we had what's called RGB IR sensors where your RGB pixels are sensitive to IR wavelengths as well. And we ran into a few issues with those sensors.
I think the first problem was in sunlight your camera just washes out and you don't have any way to filter that signal because you don't really understand what is R and what is IR because it's all combined into one pixel. Same for like G and B. So over time we realized we needed to subtract the IR signal in certain scenarios and actually pay attention to it in other scenarios. But then that's not really possible without having a way to get the delta between the RGB and the RGB IR. So there, the design direction split off.
It's like we could have added a physical filter for the IR light in front of the cameras. So imagine a mechanical shutter, kind of like sunglasses. So you put them on and then IR is filtered and you could remove them and then you are getting RGB plus IR. But that level of mechanical complexity obviously comes with its downsides. There's a lot of cost.
There's a lot of reliability issues. Also we are trying to build a compact robot wherever we can. Having that much complexity didn't quite make sense. So we chose to go to a camera configuration where RGB and IR are different pixels. And so that way we can do the filtering all within software.
So we ended up using an RGB plus IR camera which is what we use now. Yeah, that's a good question too. I think the first thing to realize is stereo vision is very important. So like I was showing just a second ago, it's not just one camera in the front, there's two. And the reason there's two cameras is how depth detection works on the Matic system.
So the way the robot works is you're seeing every frame or scene rather from two different vantage points. And so each camera corresponds to one vantage point. Internally the robot understands exactly the intrinsic and extrinsic parameters that are associated with the stereo pair as we call it. So what does that mean? So the robot knows how much lens distortion is coming from the first camera and how much is coming from the second camera and it also knows the position of the cameras with respect to each other in three d space And because of that, when we look at the same scene from two different vantage points, we can understand which pixel corresponds to the same pixel in the other vantage point.
And based on the delta in the horizontal distance between them we can then calculate the depth. I know it sounds pretty complex but it's essentially what the human brain does. So if you close your eye and look at your finger and then all of a sudden without moving your finger you close your other eye you will see that your finger shifts a little bit. So that shift is how the human brain calculates how far the finger is from you and this is the reason you don't fall down the staircase for example as you're walking down it. So yeah just from just mimicking that logic and using modern like computer vision and neural network techniques that's how they're able to build a depth map.
So having a stereo pair in the front makes sense from that. Now we could have built just the front stereo pair and not added another stereo pair in the back. However, that would mean that every time we back out of a tight space or a tight corner, we'd literally have to turn around or have something like a neck on the robot where where the robot can like turn its head around and make sure there's nothing behind it. Otherwise it could it could run into something. So rather than adding that complexity we said cameras are cheap.
Let's just add another set of eyes in the back of Matic's sort of like design. So that's how you end up with another stereo pair in the back. So that's the justification for the four cameras. The fifth camera or the up camera is one of the most debated sort of like sensor decisions inside the company. And the way we've thought about the interaction with the robot, especially with Mehul and Navneet's background in gesture recognition, we always wanted to give Matic this ability to interact with the customer almost like a human being would.
And as you can imagine if a human being can't see you you get limited in how much you can communicate. They can't see your body language. They can't see where you're pointing. They can't see which way you're facing. So in order to be able to see the human being the robot is communicating with we put down an up camera.
And so the vision here is to execute a feature that we internally call Come Here Clean This. So the idea behind come here clean this is a human being standing in the vicinity of a robot can simply point to a spot on the floor and speak to the robot just like you were to a human being and say, hey robot, go there and clean that mess. Or come here and clean this mess. And so the robot should be able to understand from your gestures and then draw a straight line in three d space and infer what part of the floor you're actually talking about. And just to be able to cover enough three d sort of space we needed a camera on top so you can actually look at people's faces and where they're pointing.
So that's how you end up with five cameras. Yeah, for sure. I think privacy has always been at the forefront of our focus. Many, many hardware devices that are out in the wild today, many of them that use cameras, try to run quite complex sort of computation. However, the way that that is done is the inputs for those computations are collected at the edge, I.
E. On the device. So it could be any kind of sensor. It could be a camera. It could be a microphone.
And then that information gets sent to a server. And the advantage of doing that is you have more compute available at the server level. And so you can run complex calculations. And then you get your output inference, and then that gets sent back to the device. Now that might be the standard way of doing things, but obviously now your information is physically distributed and more vulnerable to any kind of cybersecurity threats.
So from that perspective very early on we decided all of the compute was going to happen on device. And keeping that as a North Star never deviating from it we've always chosen the hardware, so the SOC, so that we can actually enable the software team to do all their computation on device. And like I said, we've gone through painful transitions to make that a reality. Like I said, we completely ripped out our previous SoC and added in the new SoC, the NVIDIA or Nano, to make sure that that's possible. And, yeah, I think that's how we make sure that your data basically never leaves the robot.
Matic never sort of needs to collect or store any of the user data. Definitely not images or videos. The only time we do any kind of data collection is when we get very, very explicit permission from the customer for debugging purposes or something of that sort. And we have kept this of like the sacred role within the company that we just never save any kind of customer data without any kind of permission. So we take privacy very, very seriously because we understand that that is one of the most important things for the customer.