morrowind

morrowind @ morrowind @lemm.ee

Posts

Comments

Joined

2 mo. ago

Type

Sort

2d ago

Trump administration reportedly considers a US DeepSeek ban | TechCrunch

Such dumbasses, even if this was a good strategy, they're still banning one company and let others (arguably more dangerous ones) go scot free

2w ago

Societal rules

Alright, I'm waiting on the youtube playlist

2w ago

Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages

Technically it supports fewer languages than whisper, 40 vs 99

The main problem isn't "bother", it's training data. You need hundreds of thousands of hours of high quality transcripts to train models like these and that just doesn't exist for like zulu or whatever

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

2w ago

arxiv.org Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages

This report introduces Dolphin, a large-scale multilingual automatic speech recognition (ASR) model that extends the Whisper architecture to support a wider range of languages. Our approach integrates in-house proprietary and open-source datasets to refine and optimize Dolphin's performance. The mod...

3w ago

Sentence transformers v4

I want to clarify something. Reranker is a general term that can refer to any model used for reranking. It is independent of implementation.

What you refer to

because reranker models look at the two pieces of content simultaneously and can be fine tuned to the domain in question. They shouldn't be used for the initial retrieval because the evaluation time is O(n²) as each combination of input

Is a specific implementation known as CrossEncoder that is common for reranking models but not retrieval ones for the reasons you described. But you can also use any other architecture

3w ago

Zoomers & Boomers are the same

On god

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

3w ago

Sentence transformers v4

Link to bluesky https://bsky.app/profile/tomaarsen.com/post/3llc2jvwah22f

Some more details https://huggingface.co/blog/train-reranker

3w ago

Some updates on community changes and future goals (03-28-2025)

Thumbnail looks a little odd when small. You may want to go for a more digital llama aesthetic

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

3w ago

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

electricalexis.github.io SOCIAL MEDIA TITLE TAG

SOCIAL MEDIA DESCRIPTION TAG TAG

4w ago

StarVector - a foundation model for generating svgs

autotracers can't generate svgs from text

4w ago

StarVector - a foundation model for generating svgs

Claude frequently draws svgs to illustrate things for me (I'm guessing it's in the prompt) but even though it's better at it than all the other models, it still kinda sucks. It's just fudamentally dumb task to do for a purely language model, similar to the arc-agi benchmark , just makes more sense for a vision model and trying to get an llm to do is a waste

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

4w ago

StarVector - a foundation model for generating svgs

huggingface.co starvector/starvector-1b-im2svg · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

1mo ago

EXAONE Deep ━ Setting a New Standard for Reasoning AI - LG AI Research News

what is the license? The link on hf just 404s

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

1mo ago

EXAONE Deep ━ Setting a New Standard for Reasoning AI - LG AI Research News

www.lgresearch.ai EXAONE Deep Released ━ Setting a New Standard for Reasoning AI - LG AI Research News

1mo ago

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Very similar to chain of draft but seems more thorough

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

1mo ago

arxiv.org Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Recent advances in large language models have demonstrated remarkable reasoning capabilities through Chain of Thought (CoT) prompting, but often at the cost of excessive verbosity in their intermediate outputs, which increases computational overhead. We introduce Sketch-of-Thought (SoT), a novel pro...

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

1mo ago

flashinfer.ai Sorting-Free GPU Kernels for LLM Sampling

Background

1mo ago

Reka Flash, open source 21B model comparable to QWQ 32B

More info here https://www.reka.ai/news/introducing-reka-flash
HF: https://huggingface.co/RekaAI/reka-flash-3

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

1mo ago

Reka Flash, open source 21B model comparable to QWQ 32B

1mo ago

Qwen/QwQ-32B · Hugging Face

It matches R1 in the given benchmarks. R1 has 671B params (36 activated) while this only has 32

2mo ago

Qwen/QwQ-32B · Hugging Face

insane, absolutely insane

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

2mo ago

arxiv.org Chain of Draft: Thinking Faster by Writing Less

Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermedia...

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

2mo ago

Atom of Thoughts (AOT): lifts gpt-4o-mini to 80.6% F1 on HotpotQA, surpassing o3-mini and DeepSeek-R1

bsky.app Sung Kim (@sungkim.bsky.social)

Atom of Thoughts (AOT): lifts gpt-4o-mini to 80.6% F1 on HotpotQA, surpassing o3-mini and DeepSeek-R1 ! For each reasoning step, it: 1. Decompose the question into DAG 2. Contract the subquestions into a NEW simpler question 3. Iterate until reaching an atomic question

2mo ago

Alibaba Releases Advanced Open Video Model, Immediately Becomes AI Porn Machine

good luck trying to run a video model locally

Unless you have top tier hardware