Google AI released a fascinating new paper called RT-1, where they showed off their new Robot Transformer system. Now the system itself is essentially a streamlined transformer architecture that can make decisions 3 times a second, and the robot itself isn’t the most complicated robot ever, however what makes it fascinating is essentially that it is a text to robot command system: It takes in real time film, along with a text command, and then synthesizes dynamically the series of steps and commands necessary to perform the text command, even in distracting enviroments or enviroments which it has not observed before, like an unfamiliar kitchen. It is fundamentally generalizable.
Now the most interesting part, behind the generalization of it, is how it was trained: It was trained on a large dataset, involving quite different robot mechanisms, and tasks, unrelated to the actual mechanism used by the robot, and yet it was still able to generalize from the data it was showed to how it could be applicable to the specific mechanism it was used. This is important since it indicates the possibility of some company forming a base model, some super trained super massive model that can adapt to a huge variety of tasks that could be further adapted to a specific goal. Inference speed might be said to still be an issue, but then again in a robot chassis there is more than enough space to fit several GPUs - I think just that the current researchers are thinking too small scale.
I think that based on the success of RT-1, it can’t be long before we see next level Roombas that can perform any domestic task that you desire. In 2084, doing your own work might be something of the past.