2084: Leveraging the Power of Language Models for Long Term Planning
Two recent papers on the power of LLM
Large language models are getting more and more widespread lately, and no wonder - they are absurdly impressive and general programs that seem to be able to literally do anything. In that vein, there is an interesting recent paper, where a group of researchers, through some clever prompting leveraged GPT-3 to perform long term planning in Minecraft.
Essentially, they developed a 5 step process they referred to as DEPS(Decide, Explain, Plan, and Select). How it works is that firstly, they put in the task they want to achieve as natural language, with some modifications to have it output a series of goals to achieve, then they feed these goals into a Selector, which chooses the goal that the separate gameplay controller should follow, whereupon the gameplay controller, when it encounters an issue, sends back a Description of the current state of the game, and the issue. They then leverage the LLM to generate an explanation of why the issue was encountered, and they feed this explanation back into the LLM along with a command to revise the plan, take the resulting plan back into the selector and repeat.
Now this is a very human like way to go about deciding - do stuff, if it doesn’t work, figure out why and do it differently, and it is fascinating that you can achieve success on relatively long term tasks like mining diamonds in a pretty open world game like Minecraft using only GPT-3. It indicates the massive amount of applicability of the LLM, and also raises interesting questions about how large its capability is: if it can plan the thousands of steps necessary to get diamonds, what else can it plan?
Beyond that, there is also another paper Toolformer, where they demonstrate how a smaller Large Language Model like GPT-J can outperform larger models on the use of tools like calculators or a question answerer by additional training. They approach this by annotating the dataset with places where the API should be called using a large language model, and then calling the relevant APIs and filling in where they are called, along with using a special demarcation token to indicate where the APIs start and end. They use this dataset then to finetune a LLM like GPT-J to produce output annotated by API Calls. In this way they can automatically train a LLM to use an API. The mind boggles at the possibilities: Imagine an LLM that you can ask questions of, and it’ll automatically call the relevant APIs across the whole internet to solve your queries. Heck, imagine a LLM that does C++ tools for you automatically - I know that after the 20th hour of struggling with C++ tools, a LLM that just tells you what to do would be much welcomed.
Both of these papers are just interesting though, in that they are moving more and more towards how these models should be applied. I predict we’ll see a lot more of these in the coming days. The future is going to be great and AI-powered.