Artificial Intelligence

How to Install Ollama in 2026: Step-by-Step Guide

Install Ollama on macOS, Windows, or Linux, choose a local model, test the API, configure Open WebUI, and solve common problems.

Author admin
7 min read

Ollama allows you to run language models on your own computer: Requests to the local model are processed without sending text to a third-party cloud chat. In this instruction, we will install Ollama on macOS or Windows, launch the first model, check the API and, if desired, add the Open WebUI graphical interface.

Current as of June 21, 2026. Latest stable release – Ollama v0.30.10, published June 17. The interface and commands may change in future versions.

What is Ollama and why is it needed?

Ollama is a free tool for downloading and running open models locally. Its official library includes Qwen, Llama, DeepSeek, Gemma, Mistral, and other model families. Ollama manages model files, uses available GPU acceleration, and provides a local API at http://localhost:11434.

After loading the local model, the Internet is not needed for normal dialogue. However, it is required for installation, updates and downloading of models. In addition, Ollama has cloud models: if privacy is a concern, choose a local model tag and check where the request is being made.

System requirements: how much memory will be needed

The official documentation specifies operating system and GPU compatibility requirements, but does not specify a universal RAM or VRAM minimum. Memory consumption depends on model size, quantization, and context length. Therefore, the table below is a practical guide and not a guarantee of performance.

ConfigurationWhere to startComment
8 GB RAMllama3.2:1b or llama3.2:3bSuitable for learning the basics; close memory-heavy apps
16 GB RAM / unified memoryqwen3:4b, sometimes qwen3:8bA practical minimum for compact models
32 GB8B–14B modelsMore headroom for context and other running apps
32–64 GB or moreqwen3:30b, qwen3-coder:30bThe model file is about 19 GB, but additional memory is required
  • macOS: Requires macOS Sonoma 14 or later. Apple Silicon uses the GPU through Metal and shared memory; Intel Mac is supported in CPU mode.
  • Windows: Requires Windows 10 22H2 or later. The installer works without administrator rights.
  • NVIDIA: By official compatibility list You need compute capability 5.0+ and an up-to-date driver; for new versions, the documentation indicates driver 531+.
  • Disk: A Windows installation requires at least 4 GB, and models take up from hundreds of megabytes to tens and hundreds of gigabytes.
Related:  Why is Wi-Fi at work "Dumb"? There are many reasons, but let's start with the main one.

Step 1: Install Ollama on macOS

The clearest way is to open official download page, download Ollama.dmg, move the application to the Applications folder and run it. When you launch it for the first time, Ollama will prompt you to add the command ollama to the system PATH.

Ways to install Ollama on macOS and verify the version in Terminal
The official DMG is the simplest way to install Ollama on macOS, while the Terminal command is convenient for developers.

An alternative official method is to install with one command:

curl -fsSL https://ollama.com/install.sh | sh

Ollama is also available through Homebrew, but it is a third-party manager package. If you’re already using Homebrew:

brew install ollama

After installation, close and reopen Terminal, then check the version:

ollama --version

Step 2: Install Ollama on Windows

On Windows, download OllamaSetup.exe from the official page. In the v0.30.10 release, the installer is about 1.3 GB. It does not require administrator rights and installs in the user’s home directory by default.

Installing Ollama on Windows with the installer or PowerShell
Install Ollama on Windows with OllamaSetup.exe or the official PowerShell command.

Official installation via PowerShell:

irm https://ollama.com/install.ps1 | iex

Once complete, open a new PowerShell window and run:

ollama --version

How to transfer models to another drive

If there is little space on drive C, open “Change environment variables for your account”, create a variable OLLAMA_MODELS and indicate, for example, D:\OllamaModels. After saving, close Ollama completely in the system tray and launch it again. This procedure is described in Ollama documentation for Windows.

Installation on Linux

For most Linux systems, the official project offers the same installation script:

curl -fsSL https://ollama.com/install.sh | sh

After installation, check the service with the commands ollama --version And systemctl status ollama. For NVIDIA, AMD and experimental Vulkan, check the latest hardware support page.

Step 3. Download and run the first model

The ollama run command downloads the model and then opens an interactive chat. On a computer with 8 GB RAM, start with the compact Llama 3.2:

ollama run llama3.2:3b

With 16 GB of memory, try qwen3:4b first, then qwen3:8b if the system has enough headroom:

ollama run qwen3:4b
ollama run qwen3:8b
Comparison of local Ollama models by size and purpose
Start with a small model and move to a larger one only when you have enough spare RAM or unified memory.

Enter a question after the prompt >>>. To exit use /bye or combination Ctrl+D. Useful commands:

ollama list
ollama ps
ollama pull qwen3:8b
ollama rm qwen3:8b
  • ollama list — show downloaded models.
  • ollama ps — show models loaded into memory.
  • ollama pull — download or update the model.
  • ollama rm — delete the model from the disk.
Related:  Proxy server: an invisible helper in the Internet world

Which models to choose in June 2026

ModelLoad SizeSuitable for
llama3.2:3b2.0 GBFirst launch, summarization, simple tasks
qwen3:4b2.5 GBCompact universal assistant
qwen3:8b5.2 GBTexts, analysis, multilingual tasks
deepseek-r1:8b5.2 GBReasoning, mathematics and analytics
qwen3:30b19 GBMore complex universal tasks
qwen3-coder:30b19 GBProgramming and working with more context

The download sizes come from the official model cards for Qwen 3, Qwen3-Coder, DeepSeek-R1, and Llama 3.2. They are file sizes, not exact RAM consumption figures.

Step 4: Check your local API

When Ollama is running, the API is available locally on port 11434. Checking the list of models:

curl http://localhost:11434/api/tags

Example of a single request to a model:

curl http://localhost:11434/api/generate \
  -d '{"model":"qwen3:4b","prompt":"Explain what a local LLM is","stream":false}'

Step 5: Install Open WebUI

If the terminal is inconvenient, Open WebUI adds an interface similar to the usual chat. First install Docker Desktop then run the command from official Open WebUI instructions:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui --restart always \
  ghcr.io/open-webui/open-webui:main
Local Open WebUI interface connected to Ollama
After starting the container, open http://localhost:3000 and create a local administrator account.

Tag :main updated over time. For a stable working installation, Open WebUI recommends pinning a specific version of the image. Also, do not open port 3000 to the Internet without authentication, HTTPS and basic server security.

What to do if Ollama doesn’t work

  • Command not found: restart the terminal; on macOS, check for the link in /usr/local/bin.
  • The model is too slow: choose a smaller model, reduce the context, and close memory-hogging applications.
  • Using CPU instead of GPU: update the driver and check the video card with the official support list.
  • Not enough space: remove unnecessary models via ollama rm or transfer the model catalog.
  • Open WebUI doesn’t see Ollama: check if it responds http://localhost:11434/api/tags, and whether it is set correctly OLLAMA_BASE_URL.

FAQ

Is Ollama completely free?

The tool itself is free. Each model has its own license that must be verified before commercial use. Cloud functions may also have separate terms and conditions.

Related:  Interview with PhD Dr. Berik Kaiyupkanovich Kadyrov on Digitalization Prospects, Its Impact on Business, Society, and the Future of Kazakhstan's Economy

Can I use Ollama without a video card?

Yes, local models can run on the CPU, but generation is usually noticeably slower. Start with model 1B–3B.

Does Ollama work without the Internet?

After installing and loading the local model – yes. The Internet will be needed to download and update models, as well as for cloud functions.

Which model is better for the first launch?

For 8GB RAM start with llama3.2:3b. For 16 GB try qwen3:4band then qwen3:8b, if the system retains sufficient memory.

Conclusion

For the first acquaintance, just install Ollama in the official way, check ollama --version and launch a compact model. Don’t start with 30B models just because they look more powerful: a small model that fits entirely in available memory often gives a better, faster local experience.

The cover and step-by-step images are designed as impersonal display screens. They do not contain real accounts, keys, home directories or other personal data.

Comments on this article

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top