I recently upgraded my desktop with a GeForce RTX 5060 Ti 16GiB, and naturally, my first instinct was to put that VRAM to work by setting up local AI search.
I settled on Perplexica, an open-source AI-powered search which will be backed by Ollama for the inference. Since my daily driver is Fedora 43, I wanted to do this using Podman Rootless Quadlets rather than Docker Compose.
Here’s my guide on how to orchestrate Perplexica and Ollama using systemd and NVIDIA CDI on Fedora.
Prerequisites
Before diving in, ensure you have the proprietary NVIDIA drivers installed and functioning. We will be running everything as a standard user (rootless), which is safer and cleaner, but requires some specific configuration for GPU access.
Step 1: NVIDIA Container Toolkit & CDI
Podman needs a way to pass the GPU through to the container. The modern standard for this is the Container Device Interface (CDI).
First, install the NVIDIA Container Toolkit. Switch to root for the installation steps:
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
| sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
dnf install -y nvidia-container-toolkit nvidia-container-toolkit-base \
libnvidia-container-tools libnvidia-container1
Next, we generate the CDI specification. This creates a YAML file that tells Podman exactly how to access your GPU without needing complex hooks.
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
Step 2: Podman Network
Since we are using systemd Quadlets, we can define a dedicated bridge network so our backend (Ollama) and frontend (Perplexica) can talk to each other without exposing everything to the host network.
Create the directory for your user-level systemd units if it doesn’t exist:
mkdir -p ~/.config/containers/systemd
Create the network unit file:
# ~/.config/containers/systemd/ai-net.network
[Unit]
Description=AI Bridge Network
[Network]
NetworkName=ai-net
Driver=bridge
Step 3: Setting up Ollama
Ollama will act as the backend API server.
SELinux Configuration
Fedora is secure by default. To allow our rootless containers to access hardware devices (like the GPU), we need to flip a specific SELinux boolean:
sudo setsebool -P container_use_devices 1
The Ollama Quadlet
Create the container definition. Note the AddDevice line—this utilizes the
CDI spec we generated earlier to pass the full GPU to the container.
# ~/.config/containers/systemd/ollama.container
[Unit]
Description=Ollama AI Backend
After=network-online.target
[Container]
Image=docker.io/ollama/ollama:latest
ContainerName=ollama
Network=ai-net.network
Volume=%h/.ollama:/root/.ollama:Z
# Enable GPU Access via CDI
AddDevice=nvidia.com/gpu=all
AutoUpdate=registry
[Install]
WantedBy=default.target
Enable and Start
Reload the systemd daemon so it recognizes the new Quadlet files, enable “linger” (so the container runs even when you aren’t logged in), and start the service.
mkdir ~/.ollama
systemctl --user daemon-reload
loginctl enable-linger $(whoami)
# Verify the unit file is valid (optional but recommended)
/usr/libexec/podman/quadlet -dryrun -user
# Start the service
systemctl --user start ollama
Step 4: Setting up Perplexica
Now for the frontend. Perplexica handles the search logic and UI.
The Perplexica Quadlet
We will use the “All-In-One” image for simplicity.
# ~/.config/containers/systemd/perplexica.container
[Unit]
Description=Perplexica AI Search
After=network-online.target
[Container]
Image=docker.io/itzcrazykns1337/perplexica:latest
ContainerName=perplexica
PublishPort=3000:3000
Volume=%h/.perplexica/data:/home/perplexica/data:Z
Volume=%h/.perplexica/uploads:/home/perplexica/uploads:Z
Network=ai-net.network
AutoUpdate=registry
[Install]
WantedBy=default.target
Create the directories and start the service:
mkdir -p ~/.perplexica/{data,uploads}
systemctl --user daemon-reload
systemctl --user start perplexica
Pulling the Models
Before we configure the UI, we need to download the specific models we intend
to use. I’m using Ministral-3-14B for chat and nomic-embed-text for
embeddings which will fit snugly in the GPU’s VRAM.
# Pull Chat Model
podman exec -it ollama ollama pull hf.co/bartowski/mistralai_Ministral-3-14B-Instruct-2512-GGUF:Q5_K_M
# Pull Embedding Model
podman exec -it ollama ollama pull nomic-embed-text-v2-moe:latest
Step 5: Configuration
Now, open your browser and navigate to http://localhost:3000/.
- Click Add Connection
- Select connection type
Ollama - Connection Name:
ollama - Base URL:
http://ollama:11434
- Select connection type
- Click Add Connection.
Perplexica might try to auto-assign all models to all categories.
- Remove the Embedding model from the Chat Models list.
- Remove the Chat model from the Embedding Models list.
Finally, map the specific models:
- Chat Model: Select
ollama - hf.co/bartowski/mistralai_Ministral-3-14B... - Embedding Model: Select
ollama - nomic-embed-text-v2-moe...
Click Finish.
Conclusion
We now have a fully functional AI search engine running on your local machine, accelerated by NVIDIA GPU. You can verify the GPU usage during a search query by running:
watch -n 0.5 nvidia-smi
Using Podman Quadlets makes this setup robust; the containers start at boot, update automatically, and respect standard systemd management commands.