The last article on GliGen, the annotated generation model made me think: what if you used the same technique, but you try to generate powerpoint slides instead? There are a lot of powerpoints on Slideshare which you could scrape(there is in fact a dataset that did that), now what you could do is use an OCR system to indicate where and what the text is on a powerpoint slide, and an image captioning model like this, and then use this annotated dataset as a base to generate from. Then you could have powerpoint slides which are mostly computer generated but with some control about the composition, graphics, text and such. Since powerpoint slides generally tend to not be that complex, it could even be done with a simpler model. Definitely something that I want to try doing in the future.
Discussion about this post
No posts