10% of the dataset was used for validation. Stock tickers outside of the date of the dataset was also used for testing. Qwen2.5-1.5B-Instruct was surprisingly able to generalize far beyond what was expected. The next model being worked on is Qwen2.5-14B-Instruct and the newest trainer has far more engineering than the first attempt.
Check out the work in progress below. The latest GRPO WIP trainer is Frieza QLoRA_4Bit.py. It fits on 1 A100 GPU:
Thank you for this amazing tutorial and insides. Can i run this repo on your dataset, or do i need to change something. This is a repo about multiple gpo GRPO:
If you consider to use „more/better“ data. I would recommend you this source.
https://www.financialdatasets.ai/
Did you used train, validation and test dataset or do you use the test dataset for both validation and test?
10% of the dataset was used for validation. Stock tickers outside of the date of the dataset was also used for testing. Qwen2.5-1.5B-Instruct was surprisingly able to generalize far beyond what was expected. The next model being worked on is Qwen2.5-14B-Instruct and the newest trainer has far more engineering than the first attempt.
Check out the work in progress below. The latest GRPO WIP trainer is Frieza QLoRA_4Bit.py. It fits on 1 A100 GPU:
https://github.com/IYamHim/Ginyu-Unit/
Is it possible to run this on multiple GPUs?
Yup yup, but it takes a bit of doing
Thank you for this amazing tutorial and insides. Can i run this repo on your dataset, or do i need to change something. This is a repo about multiple gpo GRPO:
https://github.com/Jiayi-Pan/TinyZero