2084: Deepstock - can you train deepseek to…

Lukas Nel

Feb 3

A short guide to using deepseek to do stocktrading.

Read →

26 Comments

Gail

Feb 3

So fascinating, cannot wait to see the results!

Expand full comment

Ethan Blagrove

Mar 22

Hey bro any updates on this? Ive been sort of keeping my eye on this space and Im interested in getting into it.

If you need more computing power and there's a way to remotely help Ive got a 3080ti

Expand full comment

Reply (1)

Maxmustermann

Apr 3

Yes i am also interested in this ! 😄

Expand full comment

Maxmustermann

Apr 5

Is there some new news for this?

Expand full comment

Maxmustermann

Feb 24

Is it still training? ☺️

Expand full comment

Reply (1)

Lukas Nel

Feb 24Edited

Will make a post soon about this: but the TL;DR is: I trained it on and off, but it kept crashing with Cuda Out of Memory errors, so this week or next week I'll need to bite the bullet and really dive deep into the distributed training code so I can a) run it continuously for a week and b) use larger models. Like, the initial results always look promising, but I leave it running for a bit, and it crashes, which is super frustrating.

Expand full comment

Reply (1)

Lukas Nel

Feb 24

Specifically, the way the GRPO trainer class is written makes it super prone to out of memory errors, so I might have to extract the training code from that altogether.

Expand full comment

Reply (1)

idunno

Feb 24

I got it working, it's been training for the last 23.43 hours and I'm almost done. I used the V2 dataset. I can provide you the code I used to get it to work if you'd like...

Expand full comment

Reply (1)

Lukas Nel

Feb 24

yeah that would be great!

Expand full comment

Reply (1)

idunno

Feb 25Edited

Sure thing. It won't let me paste the code here since it's too long. I made a github repository at https://github.com/IYamHim/Stonk-Market. That is the exact code that I used for this run, but admittedly, I need to update a couple variables, "learning_rate=2e-4" should be "learning_rate=5e-4" and "dataloader_num_workers=0" should be "dataloader_num_workers=4". I just uploaded the environment.yml and requirements.txt to make things easier for you for whatever virtual environment you perfer.

P.S. here is my current progress, I'm in the home stretch!:

{'loss': 22.4182, 'grad_norm': 0.0, 'learning_rate': 3.127171646977067e-05, 'epoch': 0.85}

{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 3.057678943710911e-05, 'epoch': 0.85}

{'loss': 0.0196, 'grad_norm': 0.0001766916102496907, 'learning_rate': 2.9881862404447535e-05, 'epoch': 0.86}

{'loss': 22.8465, 'grad_norm': 0.0, 'learning_rate': 2.918693537178596e-05, 'epoch': 0.86}

86%|██████████████████████████████████████████████████▋ | 2550/2968 [25:20:24<3:27:22, 29.77s/it

Expand full comment

Reply (2)

Lukas Nel

Feb 25

Dude, thanks so much man, I'll take a look at it.

Expand full comment

idunno

Feb 25Edited

Finished, but don't know how good it is. I'll need to do some testing:

100%|█████████████████████████████████████████████████████████████| 2968/2968 [28:47:41<00:00, 34.93s/it]

Training completed successfully

Saving model...

Model saved to: /home/2084Collective_deepstock-sp500-companies-with-info-and-user-prompt/train/qwen_stock_advisor_final1

Final GPU memory:

GPU Memory allocated: 1.53GB

GPU Memory reserved: 4.35GB

GPU 0 Memory:

Total: 11.00GB

Free: 4.35GB

Used: 1.53GB

CUDA available: True

Current device: 0

Testing model...

/home/ai/miniconda3/envs/trading_env/lib/python3.10/site-packages/torch/utils/checkpoint.py:87: UserWarning: None of the inputs have requires_grad=True. Gradients will be None

warnings.warn(

<edit AI response> I don't want people to think these are real (they are definitely not) so I just removed for clarity.

Expand full comment

Reply (1)

Continue thread →

Henry Wertz

Feb 4Edited

I put this on reddit but thought it'd be nice to note here too....

I've just been looking at some of the distilled models for sentiment analysis for the brokerage firm I'm working for. One major thing to watch out for: repeatability. With an (admitteddly heavily cut down) R1 distill, 14B Q4 Lllama (7B behaved the same), I was finding it to not be repeatable. I ran the same prompt and press release through 5 times (press release on a medication, the stock got a nice price bump when it came out). I told it to rate out of 10, with 0 being worst and 10 being the best, based on likelihood of a price bump, I had 5 runs with different scores each time; 9.0, 8.0, 8.5, 7.5, and even a 5.0 on one run. Just something to look out for, obviously that much variability makes a signal from that model rather useless.

I tested R1 14B Q6_L Qwen (thought going from Q4 to Q6 might help since I've read that reduces perplexity quite a bit) and it also gave inconsistent ratings from run to run.

I of course don't know if this is because the model I ran is so shrunk down? Or it might just be an intentional characteristic of R1; after all, for creative writing or coding, getting different text or different coding solution (as long as it's correct!) would be a positive sign of creativity, not a negative. And for chat of course having the responses vary would be a positive too.

So look out, to make sure (whether you get your model nicely trained or not) that it's buy/don't buy are at least reasonably consistent from one run to the next.

I found Qwen 2.5 14B Q6_L at least consistently gave a 8.5 (at least over 5 runs). I don't know yet if it's actually good at this kind of thing, but at least it's consistent.

I then made the press release negative but rather deranged (I put "not"s and "in"s into the press release so the positive results were negative (it was ineffective, side effects were not tolerable, etc.), but left in the bit at the end where they were pursuing stage 2 trials and FDA approval. So it rated it a 2.5, and commented it didn't rate it even lower because the plans for stage 2 trials and FDA approval might just keep the stock price steady rather than drop.

I second Gail's comment... I'm also fascinated to see the results, and I subscribed to your substack to see how it goes.

Expand full comment

Reply (2)

Lukas Nel

Feb 4

That's totally fair yeah - the RL objective actually generates 20 completions from the same prompt and takes the average reward over all of them so hopefully thay helps with fhe variance

Expand full comment

Reply (1)

Henry Wertz

Feb 5Edited

That was my plan B and I don't see why it wouldn't work well; run several runs and average them. I mean even if your most positive end up with a 7.5 or 8, and negative with a 2 or 2.5, that's still a sold 5 or 6 points of range to pick out the buys and strong buys.

Expand full comment

idunno

Feb 9

Try a larger quant size, I've read that anything under quant 8 on the distilled models has adverse effects on the output. Other models are okay at lower quants (phi4 for example, gives consistent responses with q3KL).

Expand full comment

Reply (1)

Henry Wertz

Feb 9Edited

I had wondered about that too (and went from Q4 to Q6, or actually a Q6K_L at least.) I could see a model perhaps having trouble having a "head for numbers" if it's quant'ed to far. After all, Q4 only alows 16 distinct values and Q2 4. I could see a Q4 having trouble with "rate from 0.0 to 10.0" where if it's going by 0.5s there's 20 different values. Of course a model that's designed for use with high quant could just represent numerical values in binary I suppose, but these models are not trying to do this as far as I know. I imagine these running on Q1.58 or whatever, Q2, Q4, this is analogous to reasoning through these things when your head's a bit fuzzy, you'll get an answer but it might not be correct and you might not get the same answer twice.

Expand full comment

Reply (1)

idunno

Feb 23Edited

TBH it took me a couple times reading your comment to fully grasp the analogy but I agree. I have noticed more thinking loops of nonsense when using q4 as if their thoughts are foggy. Then using the q8 version they have a clearer mind and give much better results.

I also got this code running locally with a Qwen2.5 1.5B model on a GTX 1080Ti (11GB VRAM). To get 2000 Epochs of a sample of 10,000 (not the full 300,000~) training examples takes about 8 hours. It looks like from the GRPO training documentation on Unsloth, around 1.5k Epochs is about the ceiling for RL training models. I'm doing a small test run of 296 Epochs now to see how it goes. Unfortunately, I had to use 4bit instead of 8bit due to memory constraints. Here are my outputs so far, (wandb was causing issues, I'll reenable it after this test run):

(trading_env) ai@home:/home/2084Collective_deepstock-sp500-companies-with-info-and-user-prompt/train$ python train_qwen_grpo.py

Initial GPU memory:

GPU Memory allocated: 0.00GB

GPU Memory reserved: 0.00GB

GPU 0 Memory:

Total: 11.00GB

Free: 0.00GB

Used: 0.00GB

CUDA available: True

Current device: 0

Loading model and tokenizer...

Initializing model...

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.

Preparing model for training...

Preparing training data...

Loading dataset...

Generating train split: 100%|███████████████████████████████████| 305860/305860 [00:02<00:00, 126530.40 examples/s]

Dataset loaded with 305860 examples

Processing example 0/305860

First example processed successfully!

Input IDs length: 512

Labels length: 512

Processing example 1000/305860

Processing example 2000/305860

Processing example 3000/305860

Processing example 4000/305860

Processing example 5000/305860

Processing example 6000/305860

Processing example 7000/305860

Processing example 8000/305860

Processing example 9000/305860

Reached example limit

Successfully processed 10000 examples

Training dataset size: 9500

Evaluation dataset size: 500

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.

Starting training...

0%| | 0/296 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.

{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.2222222222222223e-05, 'epoch': 0.0}

{'loss': 39.4286, 'grad_norm': 0.0, 'learning_rate': 0.00019930313588850174, 'epoch': 0.03}

5%|███▊ | 15/296 [07:22<2:18:11, 29.51s/it

Expand full comment