Skip to main content

Ollama Setup

Ollama provides an easy way to run large language models locally.

Installation

Download from ollama.ai/download:

  • Enable "Expose Ollama to the network", be sure this is a network you trust.
  • Try different context windows starting small testing increases until you find sweet spot of speed vs intelligence.

Useful Commands

ollama run <model>
ollama list # List downloaded models
ollama rm <model> # Remove a model
ollama serve # Start API server

Pull Models

Would start by visiting Ollama models catalog.

Each page includes instructions on how to pull the models.

Hugging Face models

You can also download models on Hugging Face. You can sign up on Hugging Face and plugin your hardware, this will help you decide which models you can run on your system.

I'm currently running https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF?show_file_info=Qwen3.6-35B-A3B-UD-IQ4_XS.gguf

For getting these running in Ollama, you have to convert it to a docker image.

You can get the Modelfile from https://github.com/jpmanson/llm_templates

Update the model file's FROM line to point to the model you downloaded.

e.g.

FROM ../Qwen3.6-35B-A3B-UD-IQ4_XS.gguf

Then run ollama command.

e.g.

ollama create Qwen3.6-35B-A3B-UD:IQ4_XS -f .\llm-templates\Modelfile-qwen3

Once you are happy with the file you've generated you can delete the original downloaded model as it shouldn't be needed anymore.