Self-Hosted AI Search: Running Perplexica with Ollama on Fedora 43


I recently upgraded my desktop with a GeForce RTX 5060 Ti 16GiB, and naturally, my first instinct was to put that VRAM to work by setting up local AI search.

I settled on Perplexica, an open-source AI-powered search which will be backed by Ollama for the inference. Since my daily driver is Fedora 43, I wanted to do this using Podman Rootless Quadlets rather than Docker Compose.

Here’s my guide on how to orchestrate Perplexica and Ollama using systemd and NVIDIA CDI on Fedora.

Prerequisites

Before diving in, ensure you have the proprietary NVIDIA drivers installed and functioning. We will be running everything as a standard user (rootless), which is safer and cleaner, but requires some specific configuration for GPU access.

Step 1: NVIDIA Container Toolkit & CDI

Podman needs a way to pass the GPU through to the container. The modern standard for this is the Container Device Interface (CDI).

First, install the NVIDIA Container Toolkit. Switch to root for the installation steps:

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
  | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

dnf install -y nvidia-container-toolkit nvidia-container-toolkit-base \
  libnvidia-container-tools libnvidia-container1

Next, we generate the CDI specification. This creates a YAML file that tells Podman exactly how to access your GPU without needing complex hooks.

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Step 2: Podman Network

Since we are using systemd Quadlets, we can define a dedicated bridge network so our backend (Ollama) and frontend (Perplexica) can talk to each other without exposing everything to the host network.

Create the directory for your user-level systemd units if it doesn’t exist:

mkdir -p ~/.config/containers/systemd

Create the network unit file:

# ~/.config/containers/systemd/ai-net.network
[Unit]
Description=AI Bridge Network

[Network]
NetworkName=ai-net
Driver=bridge

Step 3: Setting up Ollama

Ollama will act as the backend API server.

SELinux Configuration

Fedora is secure by default. To allow our rootless containers to access hardware devices (like the GPU), we need to flip a specific SELinux boolean:

sudo setsebool -P container_use_devices 1

The Ollama Quadlet

Create the container definition. Note the AddDevice line—this utilizes the CDI spec we generated earlier to pass the full GPU to the container.

# ~/.config/containers/systemd/ollama.container
[Unit]
Description=Ollama AI Backend
After=network-online.target

[Container]
Image=docker.io/ollama/ollama:latest
ContainerName=ollama
Network=ai-net.network
Volume=%h/.ollama:/root/.ollama:Z
# Enable GPU Access via CDI
AddDevice=nvidia.com/gpu=all
AutoUpdate=registry

[Install]
WantedBy=default.target

Enable and Start

Reload the systemd daemon so it recognizes the new Quadlet files, enable “linger” (so the container runs even when you aren’t logged in), and start the service.

mkdir ~/.ollama
systemctl --user daemon-reload
loginctl enable-linger $(whoami)

# Verify the unit file is valid (optional but recommended)
/usr/libexec/podman/quadlet -dryrun -user

# Start the service
systemctl --user start ollama

Step 4: Setting up Perplexica

Now for the frontend. Perplexica handles the search logic and UI.

The Perplexica Quadlet

We will use the “All-In-One” image for simplicity.

# ~/.config/containers/systemd/perplexica.container
[Unit]
Description=Perplexica AI Search
After=network-online.target

[Container]
Image=docker.io/itzcrazykns1337/perplexica:latest
ContainerName=perplexica
PublishPort=3000:3000
Volume=%h/.perplexica/data:/home/perplexica/data:Z
Volume=%h/.perplexica/uploads:/home/perplexica/uploads:Z

Network=ai-net.network

AutoUpdate=registry

[Install]
WantedBy=default.target

Create the directories and start the service:

mkdir -p ~/.perplexica/{data,uploads}
systemctl --user daemon-reload
systemctl --user start perplexica

Pulling the Models

Before we configure the UI, we need to download the specific models we intend to use. I’m using Ministral-3-14B for chat and nomic-embed-text for embeddings which will fit snugly in the GPU’s VRAM.

# Pull Chat Model
podman exec -it ollama ollama pull hf.co/bartowski/mistralai_Ministral-3-14B-Instruct-2512-GGUF:Q5_K_M

# Pull Embedding Model
podman exec -it ollama ollama pull nomic-embed-text-v2-moe:latest

Step 5: Configuration

Now, open your browser and navigate to http://localhost:3000/.

  1. Click Add Connection
    • Select connection type Ollama
    • Connection Name: ollama
    • Base URL: http://ollama:11434
  2. Click Add Connection.

Perplexica might try to auto-assign all models to all categories.

  • Remove the Embedding model from the Chat Models list.
  • Remove the Chat model from the Embedding Models list.

Finally, map the specific models:

  • Chat Model: Select ollama - hf.co/bartowski/mistralai_Ministral-3-14B...
  • Embedding Model: Select ollama - nomic-embed-text-v2-moe...

Click Finish.

Conclusion

We now have a fully functional AI search engine running on your local machine, accelerated by NVIDIA GPU. You can verify the GPU usage during a search query by running:

watch -n 0.5 nvidia-smi

Using Podman Quadlets makes this setup robust; the containers start at boot, update automatically, and respect standard systemd management commands.


comments powered by Disqus