Simon Willison's Weblog

That model was trained in part utilizing their unreleased R1 "thinking" design. Today they have actually launched R1 itself, together with an entire household of new models obtained from that base.

There's a whole lot of stuff in the new release.

DeepSeek-R1-Zero appears to be the base model. It's over 650GB in size and, like the majority of their other releases, is under a clean MIT license. DeepSeek alert that "DeepSeek-R1-Zero comes across difficulties such as endless repetition, bad readability, and language mixing." ... so they also launched:

DeepSeek-R1-which "includes cold-start information before RL" and "attains performance equivalent to OpenAI-o1 throughout mathematics, code, and reasoning tasks". That one is likewise MIT certified, and is a comparable size.

I do not have the ability to run models bigger than about 50GB (I have an M2 with 64GB of RAM), so neither of these two designs are something I can quickly play with myself. That's where the new distilled designs are available in.

To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based upon Llama and Qwen.

This is an interesting flex! They have models based upon Qwen 2.5 (14B, photorum.eclat-mauve.fr 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).

Weirdly those Llama models have an MIT license connected, which I'm uncertain is compatible with the underlying Llama license. Qwen models are Apache accredited so possibly MIT is OK?

Licensing aside, these distilled designs are fascinating monsters.

Running DeepSeek-R1-Distill-Llama-8B-GGUF

Quantized variations are currently starting to appear. Up until now I've tried just among those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth AI-and it's actually fun to play with.

I'm running it using the mix of Ollama, LLM and wavedream.wiki the llm-ollama plugin.

First I fetched and timeoftheworld.date ran the model using Ollama itself:

This downloads around 8.5 GB of design information and starts an interactive chat user interface.

Once the design has been fetched, LLM can talk with it also. I choose using LLM for experiments because it logs whatever to SQLite for later exploration.

I set up the plugin and ran it like this:

Each action starts with a ... pseudo-XML tag containing the chain of idea used to assist generate the action. For "a joke about a pelican and a walrus who run a tea space together" that began like this:

Okay, so I need to come up with a joke about a pelican and a walrus running a tea space together. Hmm, that's an interesting mix. Let me believe about how these two characters might communicate in an amusing situation.

First, let's consider their qualities. Pelicans are understood for their long beaks and Webbed feet, typically seen near the beach or water. Walruses have huge teeth, thick fur, and they're usually discovered in colder climates, like icebergs or snowy locations. So, combining these two into a tea room setting is currently an amusing image.

[...]

It then believed for 20 paragraphs before outputting the joke! Here's the complete records and here's the output at the end, after the tag:

The Joke:

A pelican and a walrus decide to open a quaint little tea space together. The walrus, with its huge size, has a hard time to discover comfy chairs, so it rests on the table by accident, knocking over the teapot. Meanwhile, the pelican, trying to assist, utilizes its beak to put dishes on the table, causing a few spills.

After a series of funny accidents, the walrus searches for and states with a smile, "This isn't so fishy anymore." The pelican smirks and says, "Maybe not, but we do have a lot of krill in our tea!"

Explanation:

- Setup: qoocle.com Pelican and Walrus run an unlikely tea room. Their sizes result in comical mayhem.
- Punchline: "This isn't so fishy anymore" is a pun on the circumstance becoming less absurd. The follow-up joke presents "krill," a play on words that ties into both marine life ("fish") and tea, making it humorous while fitting the theme.

So yeah, the joke is terrible. But the procedure of getting there was such a fascinating insight into how these brand-new models work.

This was a fairly little 8B model. I'm eagerly anticipating trying the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70B-currently my favourite GPT-4 class design that I've worked on my own machine-I have high expectations.

Update 21st January 2025: I got this quantized version of that Llama 3.3 70B R1 distilled design working like this-a 34GB download:

Can it draw a pelican?

I attempted my timeless Generate an SVG of a pelican riding a bicycle timely too. It did refrain from doing effectively:

It aimed to me like it got the order of the components incorrect, honkaistarrail.wiki so I followed up with:

the background wound up covering the remainder of the image

It thought some more and gave me this:

Just like the earlier joke, the chain of believed in the transcript was much more fascinating than completion outcome.

Other methods to try DeepSeek-R1

If you wish to try the model out without installing anything at all you can do so utilizing chat.deepseek.com-you'll require to produce an account (check in with Google, utilize an email address or offer a Chinese +86 telephone number) and after that pick the "DeepThink" choice listed below the timely input box.

DeepSeek offer the model by means of their API, utilizing an OpenAI-imitating endpoint. You can access that through LLM by dropping this into your extra-openai-models. yaml setup file:

Then run llm keys set deepseek and paste in your API key, forum.altaycoins.com then utilize llm -m deepseek-reasoner 'timely' to run prompts.

This will not reveal you the thinking tokens, unfortunately. Those are dished out by the API (example here) however LLM does not yet have a method to display them.