There was an odd paper I read recently on hungry hungry hippos or rather the use of state space models for language generation instead of transformers since state spacd models scale linearly in the input size rather than quadratically like transformers, although they aren’t nearly as effective. Still tho, they are pretty good to read about and given their superior theoretical efficiency something to look at for the future.
Discussion about this post
No posts