A default Docker install on a VPS is convenient and dangerous in equal measure. The daemon runs as root, the docker group is a passwordless sudo, containers ship with broad capabilities, and one bad image or one mounted socket can hand an attacker the host.
Every one of those problems has a fix that ships with Docker itself. This guide walks through the changes that move the needle on a single-host VPS, in the order I'd apply them on a fresh server.
--read-only, no-new-privileges, and a tmpfs for /tmpdocker group, evermem_limit and cpus so one container can't starve the restThis guide assumes Docker Engine, not Docker Desktop. The commands target a standard Linux host.
The default installer adds your user to the docker group so you can run docker without sudo. That group membership is equivalent to root. Anyone in the docker group can run:
docker run --rm -v /:/host alpine chroot /host sh
That's a root shell on the host. No password prompt, no audit trail. There is no way to limit what a docker group member can do short of removing them from the group.
The fix is one of:
docker group at all.sudo docker for everything and don't add humans to the docker group.If you share a server with people you wouldn't give full sudo, options 1 and 2 are the only honest answers.
Rootless mode runs dockerd as a regular user. The daemon, containers, and network stack all live in a user namespace owned by your account. A container escape gives the attacker your unprivileged shell, not root.
First, install the prerequisites and the rootless extras package:
sudo apt update
sudo apt install -y uidmap dbus-user-session fuse-overlayfs slirp4netns \
docker-ce-rootless-extras
If you already have rootful Docker installed, disable the system service so the rootless one can take over:
sudo systemctl disable --now docker.service docker.socket
sudo rm -f /var/run/docker.sock
Then, as your regular user (not root):
dockerd-rootless-setuptool.sh install
The script writes a systemd user unit and prints the environment variables you need. Add them to your shell profile:
cat >> ~/.bashrc <<'EOF'
export PATH=/usr/bin:$PATH
export DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock
EOF
source ~/.bashrc
Enable lingering so the daemon survives logout:
sudo loginctl enable-linger "$USER"
systemctl --user enable --now docker
Verify:
docker info | grep -i rootless
You should see rootless listed under Security Options.
Some workloads still need rootful Docker. The daemon's user namespace remap maps container UID 0 to a high unprivileged UID on the host, so a break-out leaves the attacker as dockremap rather than root.
Edit (or create) /etc/docker/daemon.json:
{
"userns-remap": "default",
"live-restore": true,
"no-new-privileges": true,
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "5"
},
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 65536,
"Soft": 65536
}
}
}
Then restart:
sudo systemctl restart docker
The default value tells Docker to create a dockremap user and map the container UID range to it. New directories appear under /var/lib/docker/<subuid>.<subgid>/. Existing image and volume data lives in the old paths, so plan a migration.
A few things break under userns-remap:
chown the host path to the remapped UID.DOCKER_BUILDKIT=0 docker build .) or build images on a separate host.--userns=host with anything you care about.If those constraints are too painful, prefer Step 2.
Linux capabilities are the granular pieces of root. By default, Docker grants a generous subset including NET_RAW, SYS_CHROOT, and SETUID. Most apps need none of these. Drop everything and add back only what's required:
docker run \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64m \
--security-opt no-new-privileges \
--memory=512m \
--cpus=1.0 \
nginx:alpine
What each flag does:
--cap-drop=ALL strips every capability from the container.--cap-add=NET_BIND_SERVICE lets the process bind to ports below 1024. nginx needs this for port 80; most other apps don't.--read-only mounts the root filesystem read-only. The container can't drop a webshell on disk because there's no writable disk.--tmpfs /tmp gives the app a writable scratch area in RAM. It vanishes on container restart, which is what you want.--security-opt no-new-privileges prevents setuid binaries from raising privileges inside the container. Stops a whole class of escalation tricks.--memory and --cpus cap resource usage so a runaway worker can't kill the host.Capabilities you might legitimately need:
NET_BIND_SERVICE: bind to ports under 1024CHOWN, DAC_OVERRIDE, FOWNER: writing files as different users (most package managers in entrypoints)SETUID, SETGID: dropping privileges inside the containerIf your container fails with a permissions error after you drop everything, strace or check the logs for the missing capability and add only that one back.
You probably aren't running everything from the command line. The same flags translate cleanly to a Compose file:
services:
web:
image: ghcr.io/example/web:1.4.2
restart: unless-stopped
read_only: true
tmpfs:
- /tmp:rw,noexec,nosuid,size=64m
- /run:rw,noexec,nosuid,size=8m
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
security_opt:
- no-new-privileges:true
mem_limit: 512m
mem_reservation: 256m
cpus: 1.0
pids_limit: 200
ulimits:
nofile:
soft: 4096
hard: 8192
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
environment:
NODE_ENV: production
networks:
- frontend
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:8080/health"]
interval: 30s
timeout: 5s
retries: 3
networks:
frontend:
A few things to call out:
1.4.2), not latest. Pinning is a security control: it stops a poisoned upstream tag from rolling into production on the next docker compose pull.pids_limit stops a fork bomb inside the container from taking the host with it.mem_limit and cpus are non-negotiable on a multi-tenant VPS. Without them the OOM killer becomes a lottery.logging overrides ride on top of any daemon-level defaults you set in daemon.json. Belt and suspenders.If your app legitimately needs to write somewhere persistent, add a named volume and keep the rest read-only. Don't relax read_only for the whole service just because one cache directory needs to be writable.
The hardening above protects against runtime exploits. It does nothing against an image that ships with a known-vulnerable OpenSSL or a backdoor in a transitive npm dependency. For that, you need a scanner.
Trivy is fast, free, and runs as a single binary:
sudo apt install -y wget gnupg
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | \
sudo gpg --dearmor -o /usr/share/keyrings/trivy.gpg
echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] \
https://aquasecurity.github.io/trivy-repo/deb generic main" | \
sudo tee /etc/apt/sources.list.d/trivy.list
sudo apt update
sudo apt install -y trivy
Then scan an image before you run it:
trivy image --severity HIGH,CRITICAL --exit-code 1 ghcr.io/example/web:1.4.2
The --exit-code 1 flag makes Trivy fail the command if it finds anything HIGH or CRITICAL. Wire it into your deploy script and the script will refuse to roll out a vulnerable image.
For an existing host, scan everything you've already pulled:
docker images --format '{{.Repository}}:{{.Tag}}' | \
grep -v '<none>' | \
xargs -I{} trivy image --severity HIGH,CRITICAL --quiet {}
The first scan on a fresh server is usually a wake-up call. Pin newer base images, rebuild, rescan.
A noisy container with the default json-file log driver will quietly eat your disk. There is no rotation by default. I've seen 80 GB of logs from one misbehaving worker that nobody noticed until the host ran out of inodes.
The daemon.json snippet from Step 3 sets a sensible global default: 10 MB per file, 5 files retained, per container. That's 50 MB max per container, which is plenty for triage and small enough to be safe.
If you want to ship logs off-host instead, swap the driver:
{
"log-driver": "journald"
}
journald rotates with the rest of the system journal and integrates with journalctl -u docker.service CONTAINER_NAME=foo.
The Docker daemon socket lives at /var/run/docker.sock (rootful) or $XDG_RUNTIME_DIR/docker.sock (rootless). Anything that can talk to that socket controls the daemon, which means it controls the host.
Two rules I treat as non-negotiable:
/var/run/docker.sock into a container that runs untrusted code. That includes Watchtower, Portainer agents, CI runners, and webhook receivers. If the container is compromised, so is the host.services:
socket-proxy:
image: tecnativa/docker-socket-proxy:latest
environment:
CONTAINERS: 1
IMAGES: 1
POST: 0
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
restart: unless-stopped
watchtower:
image: containrrr/watchtower
environment:
DOCKER_HOST: tcp://socket-proxy:2375
depends_on:
- socket-proxy
restart: unless-stopped
The proxy turns a root-equivalent socket into a tightly scoped HTTP API. Watchtower can list containers and pull images, but it can't create privileged containers or mount the host filesystem.
After all of the above, walk through one of your running services and confirm the hardening is actually applied:
docker inspect myservice --format '
ReadOnly: {{.HostConfig.ReadonlyRootfs}}
CapDrop: {{.HostConfig.CapDrop}}
CapAdd: {{.HostConfig.CapAdd}}
SecurityOpt: {{.HostConfig.SecurityOpt}}
Memory: {{.HostConfig.Memory}}
NanoCPUs: {{.HostConfig.NanoCpus}}
PidsLimit: {{.HostConfig.PidsLimit}}
'
You're looking for:
ReadOnly: trueCapDrop: [ALL]CapAdd: only the capabilities you intentionally grantedSecurityOpt: includes no-new-privilegesMemory and NanoCPUs are non-zeroPidsLimit is setIf any of those are empty, the Compose file or the run command isn't doing what you think it is.
Rootless Docker can't bind to port 80 or 443. That's expected; rootless mode cannot bind to ports below 1024 without extra configuration. Either run your reverse proxy on the host (rootful, but exposed only to localhost) and proxy to a high port in the rootless namespace, or set net.ipv4.ip_unprivileged_port_start=80 in /etc/sysctl.d/99-rootless.conf and reboot. The latter has its own implications, since any user can now bind low ports.
BuildKit fails after enabling userns-remap. This is a known limitation. Build with the legacy builder using DOCKER_BUILDKIT=0 docker build ., or do your image builds on a separate host (or in CI) that doesn't have userns-remap enabled. Push the built image and pull it on the hardened host.
An AppArmor profile blocks a legitimate workload. Symptoms include processes failing to write to a path that the bind mount clearly allows, or odd EACCES errors from inside an otherwise fine container. Check with dmesg | grep DENIED. The pragmatic fix is to write a custom AppArmor profile for that one image (--security-opt apparmor=my-profile) rather than disabling AppArmor system-wide.
Can't bind-mount /etc/passwd in rootless mode. Rootless Docker maps your host UID range into the container, and the container's view of /etc/passwd is a remapped one. Mounting the host file directly produces UID mismatches. Generate a passwd-style file inside the container instead, or pre-bake the user into the image.
Trivy reports zero vulnerabilities on a six-month-old image. Almost certainly stale CVE data. Run trivy image --download-db-only or just trivy image --reset and rescan. The DB ships separately from the binary.
A few things to look at once the basics above are in place:
--privileged. Worth it for multi-tenant setups.ufw or nftables) is your last line; treat it that way.None of these are required to be safe. They're what you reach for when you've outgrown the basics.
That's the working set. Run rootless when you can, drop capabilities aggressively, scan images, and treat the docker socket and the docker group with the same respect you'd give the root password. The defaults aren't safe, but the safe configuration isn't far away.
Looking for a VPS that's ready for hardened Docker workloads? Our Linux plans ship with NVMe storage, IPv6, and full root access. See the options.