Every time you drag a PDF onto a "free PDF merger" website, you are handing a document - often a contract, a payslip, a scanned passport - to a stranger's server. Those sites pay their bills with your files: ads, trackers, and in the worst cases a quiet copy kept around. For a one-off it feels harmless, but the moment you are merging tax documents or signing an NDA, you really do not want that upload happening.
Stirling-PDF is the fix. It is a self-hosted web app that does just about everything those scattered websites do - merge, split, rotate, compress, convert, OCR, watermark, sign, redact, fill forms - all from one clean interface, all on your own server. Nothing leaves the box. This guide walks through a production-ready install on a VPS using Docker, with Caddy in front for automatic HTTPS and a login so it is not open to the world.
pdf.example.com at your serverdocker-compose.ymlTotal time: about 15 minutes.
80 and 443 open to the internet (Let's Encrypt needs them)Stirling-PDF is a Java application, so it idles a little heavier than a Go app - figure on 200-400 MB of RAM at rest. The CPU-hungry operations are OCR and compression, and those run only while you actively use them, so a small VPS handles a single user comfortably.
In your DNS provider, create an A record:
pdf.example.com → YOUR_VPS_IPV4
Add an AAAA record too if your server has IPv6. Confirm it resolves:
dig +short pdf.example.com
The output should be your VPS IP. Caddy cannot issue a certificate until DNS points at the box.
On a fresh Ubuntu 22.04 or 24.04 server:
sudo apt update
sudo apt install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
-o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Check it works:
docker --version
docker compose version
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
Caddy needs 80 for the ACME HTTP challenge and 443 for HTTPS. Do not expose Stirling-PDF's internal port 8080 to the internet - Caddy is the only thing that should answer publicly.
sudo mkdir -p /opt/stirling-pdf
cd /opt/stirling-pdf
sudo mkdir -p tessdata configs customFiles logs caddy-data caddy-config
These directories map to how Stirling-PDF keeps state:
tessdata/ holds Tesseract OCR language files - the data that lets it read text out of scanned imagesconfigs/ stores the app settings and, with login enabled, the user databasecustomFiles/ is for optional branding (custom logos, footer text, static overrides)logs/ is exactly what it sounds likeEverything important to back up lives under /opt/stirling-pdf.
Create /opt/stirling-pdf/docker-compose.yml:
services:
stirling-pdf:
image: stirlingtools/stirling-pdf:latest
container_name: stirling-pdf
restart: unless-stopped
environment:
DOCKER_ENABLE_SECURITY: "true"
SECURITY_ENABLELOGIN: "true"
SYSTEM_DEFAULTLOCALE: "en-US"
UI_APPNAME: "PDF Tools"
UI_HOMEDESCRIPTION: "My private, self-hosted PDF toolkit"
UI_APPNAMENAVBAR: "PDF Tools"
LANGS: "en_GB"
volumes:
- ./tessdata:/usr/share/tessdata
- ./configs:/configs
- ./customFiles:/customFiles
- ./logs:/logs
networks:
- pdfnet
caddy:
image: caddy:2
container_name: stirling-caddy
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- ./caddy-data:/data
- ./caddy-config:/config
networks:
- pdfnet
networks:
pdfnet:
Notes:
DOCKER_ENABLE_SECURITY: "true" builds the image with the security/login module included. It must be set for SECURITY_ENABLELOGIN to have any effect.SECURITY_ENABLELOGIN: "true" puts a login wall in front of every tool. Leave this on - an open PDF toolkit on the public internet will get found and abused as free compute.LANGS controls which OCR language packs the image downloads at startup. en_GB is a sensible default; add more like LANGS: "en_GB,de_DE,fr_FR" and the container fetches them on boot.8080 stays on the pdfnet Docker network. Caddy reaches it by container name, so there is no ports: mapping on the Stirling service.Create /opt/stirling-pdf/Caddyfile:
pdf.example.com {
encode zstd gzip
reverse_proxy stirling-pdf:8080
}
Caddy automatically requests a Let's Encrypt certificate for pdf.example.com on first boot and renews it forever, with the HTTP-to-HTTPS redirect handled for you. Unlike nginx, Caddy does not impose a small default upload limit, so big scanned PDFs pass through without extra tuning.
cd /opt/stirling-pdf
sudo docker compose up -d
sudo docker compose logs -f
The first boot takes a minute or two: the container downloads the OCR language packs listed in LANGS and warms up the Java runtime. Wait until the Caddy logs show the certificate was issued and Stirling-PDF reports it is listening, then open https://pdf.example.com.
With login enabled, Stirling-PDF ships with one account:
adminstirlingLog in immediately and change it. Go to the account menu (top right) -> Settings -> Change username/password and set a strong password from your password manager. While you are there, create separate accounts for anyone else who needs access rather than sharing the admin login.
admin / stirling credentials is the single most important step. Bots scan for self-hosted apps and try known defaults first. Do this before you walk away from the server, not later.
The home page is a grid of every operation, grouped by category. The ones you will reach for most:
Every operation runs on the server and hands you a file to download. Nothing is uploaded to a third party, which is the entire point.
A scanned PDF is just a stack of images - you cannot select or search the text. OCR (optical character recognition) fixes that by recognizing the characters and layering a searchable text behind the image. Stirling-PDF uses Tesseract, and Tesseract needs a language data file for each language you want to read.
If you set LANGS in the compose file, the matching packs were already downloaded into tessdata/ on first boot. To add a language by hand, drop its .traineddata file into the folder:
cd /opt/stirling-pdf/tessdata
sudo curl -fsSLO https://github.com/tesseract-ocr/tessdata/raw/main/deu.traineddata
sudo docker compose restart stirling-pdf
Now open the OCR / Cleanup tool, upload a scanned document, pick the language, and choose whether to embed a searchable text layer (the usual choice) or convert to fully selectable text. The result is a PDF you can search and copy from.
OCR is the most CPU-intensive thing Stirling-PDF does. A large multi-page scan can peg a core for a while - that is normal. On a small shared VPS, process big batches one at a time.
Stirling-PDF has a Pipeline feature that chains operations into a saved workflow - for example "OCR, then compress, then add a watermark" applied to every file you drop in. You build the pipeline once in the UI, export it as a JSON file, and it lives in your configs/ directory. For a small office that always processes incoming scans the same way, this turns a five-click chore into a single action.
For true hands-off automation you can also call the REST API directly. Every tool in the UI has a matching endpoint, documented at https://pdf.example.com/swagger-ui/index.html. A nightly cron job that flattens and compresses a folder of invoices is a few lines of curl.
The files themselves are transient, but your settings, user accounts, custom branding, and saved pipelines are not. They live in configs/ and customFiles/. Pair them with a nightly restic job to object storage:
restic -r s3:s3.amazonaws.com/my-backup-bucket backup \
/opt/stirling-pdf/configs /opt/stirling-pdf/customFiles
The data is tiny - usually a few megabytes - so backups are instant. Restoring is just dropping the folders back and starting the container.
If you are the only person using the toolkit, it does not need to face the public internet at all. Putting Stirling-PDF behind Tailscale removes every bot scan and login attempt in one move - skip the firewall openings for 80 and 443 and reach pdf.example.com over your tailnet instead. A WireGuard VPN on your VPS does the job just as well if you already run one. For a shared instance with a real login, the public Caddy setup above is fine - just keep the password strong.
Caddy returns a 502 right after docker compose up. Stirling-PDF takes a while to start its Java runtime and download language packs on first boot, and Caddy can reach it first. Give it a minute or two and reload. If it sticks, check docker compose logs stirling-pdf.
Login is enabled but I cannot get past the password page. Both variables are required together: DOCKER_ENABLE_SECURITY must be true for the login module to exist, and SECURITY_ENABLELOGIN must be true to switch it on. If you set only the second one, the toggle does nothing. Fix both and run docker compose up -d to recreate the container.
OCR says no languages are available. The tessdata/ folder is empty or the file name is wrong. Confirm a .traineddata file is present (ls /opt/stirling-pdf/tessdata) and restart the container. Remember the codes are three letters - eng, deu, fra - not the two-letter LANGS codes.
Large file uploads fail or time out. This is almost always a reverse-proxy limit upstream of Caddy, not Stirling-PDF. If you put Cloudflare in front, free plans cap uploads at 100 MB. Connect directly to your server's hostname for big files, or split the document first.
The app feels sluggish under load. Java plus OCR plus compression is genuinely memory-hungry. If the container is being killed (docker compose logs shows it restarting), bump the VPS to 2 GB RAM or add swap.
.traineddata file from the Tesseract data repo and drop it into tessdata/ - there are over 100 languages, including vertical scripts./swagger-ui/index.html documents every endpoint, so you can script repetitive jobs from another machine or a cron task.That's it. A self-hosted PDF toolkit gives you every common document operation in one place, fast, and without uploading a single sensitive file to a website you have never heard of.
Need a VPS for your self-hosted tools? Our Linux plans ship with fast NVMe storage, generous bandwidth, and IPv6 out of the box. See the options.