Skip Navigation

pepperfree

@ pepperfree @sh.itjust.works

Posts

3
Comments

14
Joined

3 yr. ago

6mo ago

DeepSeek dropped the V3.1 Weight
Jump
pepperfree @sh.itjust.works 6mo ago
I wonder if we can extend the context length. It already fine-tuned with YaRN so we can't get free extend with that method.

6mo ago

DeepSeek dropped the V3.1 Weight

Jump

pepperfree @sh.itjust.works 6mo ago

Ah. Sorry, good thing I attached the related link.

6mo ago

DeepSeek dropped the V3.1 Weight

Jump

pepperfree @sh.itjust.works 6mo ago

Everybody been rumoring about R2. So releasing this thing kinda unexpected

LocalLLaMA @sh.itjust.works

pepperfree @sh.itjust.works

6mo ago

DeepSeek dropped the V3.1 Weight

huggingface.co /deepseek-ai/DeepSeek-V3.1

7mo ago

Fine tuned models for summarisation?

Jump

pepperfree @sh.itjust.works 7mo ago

So something like

    
Previously the text talk about [last summary]
[The instruction prompt]...
[Current chunk/paragraphs]

7mo ago

When DeepSeek V4 and R2?

Jump

pepperfree @sh.itjust.works 7mo ago

The RL is so good grok changed it's personality by changing small part of it's system prompt

7mo ago

When DeepSeek V4 and R2?

Jump

pepperfree @sh.itjust.works 7mo ago

Llama 3.3 was good, tho. For the multimodal, llama 4 also use llama3.2 approach where the image and text is made into single model instead using CLIP or siglip.

7mo ago

When DeepSeek V4 and R2?

Jump

pepperfree @sh.itjust.works 7mo ago

7mo ago

When DeepSeek V4 and R2?

Jump

pepperfree @sh.itjust.works 7mo ago

They got the whole Twitter database. It's kinda the same with Gemini. But somehow Meta isn't catching up, maybe their llama 4 architecture isn't that stable to train.

7mo ago

When DeepSeek V4 and R2?

Jump

pepperfree @sh.itjust.works 7mo ago

There is new project which they share fine-tuned modernbert on some task. Here is the org https://huggingface.co/adaptive-classifier

7mo ago

When DeepSeek V4 and R2?

Jump

pepperfree @sh.itjust.works 7mo ago

It changed after Grok 3

LocalLLaMA @sh.itjust.works

pepperfree @sh.itjust.works

7mo ago

So image generation is where it's at?

Jump

pepperfree @sh.itjust.works 7mo ago

Lots of developer choose to write in CUDA as ROCm support back then is a mess.

7mo ago

So image generation is where it's at?

Jump

pepperfree @sh.itjust.works 7mo ago

No, you can run sd, flux based model inside the koboldcpp. You can try it out using the original koboldcpp in google colab. It loads gguf model. Related discussion on Reddit: https://www.reddit.com/r/StableDiffusion/comments/1gsdygl/koboldcpp_now_supports_generating_images_locally/

Edit: Sorry, I kinda missed the point, maybe I'm sleepy when writing that comment. Yeah, I agree that LLM need big memory to run which is one of it's downside. I remember someone doing comparison that API with token based pricing is cheaper that to run it locally. But, running image generation locally is cheaper than API with step+megapixel pricing.