2084: Ideas on a personal roboassistent - KITT in real life
I was thinking about tying together some models to see whether I could create a helpful robo assistent for my car.
If you’ve ever watched the classic series Knight Rider, you know that Kit, the car with artificial intelligence was the true star of that show, and by far the coolest part of it. Thus I was wondering, as I was playing around with ChatGPT, whether it would be possible to create a fascimile to KITT using the new AI models available.
Essentially, it would leverage three models, a speech to text model, a language processing model and a text to speech model for the output. For the Speech to Text part, I’d use OpenAI’s Whisper, since it’s quite fast and reliable. For the language processing part I’ll use GPT-3, and for the speech to text part, I’ll use Coqui , which allows you to use audio files to adjust the resulting voice.
Now the speech to text part will be rather basic - Whisper, and I’ll use the punctuation generated to mark the ends of sentences. Since it’s a transformer, it’ll output a stream of tokens, and thus it’ll be relatively responsive. It also comes in various sizes, so I can check the various sizes until I find one where the inference time isn’t too bad.
Now, each sentence I’ll get from Whisper, I’ll process. For the GPT-3 part, I’ll add a blurb at the beginning of the transmission, saying some basic personality things for the AI assistant, and the setup for it, ya know, something like “Pretend you are an snarky AI assistant and respond to the following sentences with what the assistant would say, and wrap your replies in exclamation marks !<dialogue>!”. In addition, I’ll add a list of basic commands it can execute, stuff like “play music”, “call a person” etc, and say add a command at the end something like “if you feel that the dialogue calls for it, execute one of the following commands by outputting the keyword and a target” and then list the commands. Essentially giving the model a set of options to choose from. And then I’ll progressively append the sentences I got from whisper, and record the replies, until I have a sufficiently large string, at which point I will ask GPT-3 to summarize the conversation into a short 100 word paragraph, and append that to the blurb at the start. Essentially in this way, I’ll give it a bit of memory to make it more personable.
Now for each of the sentences it generates, if its a command, I’ll execute it using a python script or something, and if its a sentence, I’ll feed it through to coqui to say.
I might have to look into checking whether I interrupt it mid sentence with Whisper and implementing some sort of exit function, to make it sound more natural.
This should if executed properly create something that acts a lot like an AI assistant. Of course, I haven’t implemented this yet since I’ve had a lot of work to do, but it could be an interesting project. It would be fun to set up an arduino board too, with a wifi board, a speaker and a microphone and attach it to my car’s dashboard, to have my own KITT to talk too as I drive long distances. Perhaps in 2084 all cars will have something like this. Or perhaps, everything will have something like this.