A missing piece for AF_VSOCK in the Linux kernel has been network namespace support. We discussed it as a future challenge during the KVM Forum 2019 talk and it was mentioned in several conference discussions since then.
I started working on namespace support back in
2019,
but never had the chance to complete it. Last year, Bobby Eshleman (Meta)
restarted the effort and drove it through 16 revisions of the patch series.
Daniel Berrangé, Michael S. Tsirkin, Paolo Abeni, and I contributed with
reviews and suggestions that shaped the current user API.
The result has been merged into net-next and will be available in
Linux 7.0.
Background
Network namespaces are a fundamental building block for containers in Linux. They provide isolation of the network stack, so each namespace has its own interfaces, routing tables, and sockets.
Before Linux 7.0, AF_VSOCK was not namespace-aware. All vsock sockets lived in the same global space, regardless of the network namespace they were created in. This caused two problems:
- No isolation: a VM started inside a network namespace (or container) was reachable via vsock from any other namespace on the host, breaking the isolation that containers expect.
- No CID reuse: since CIDs were global, two VMs in different namespaces could not use the same CID, even if they were completely isolated from each other at the network level.
Design
The new implementation introduces two modes, configured per network namespace:
- global: CIDs are shared across namespaces. This is the original behavior and the default, so existing setups continue to work without any change.
- local: namespaces are completely isolated. Sockets in a local-mode namespace can only communicate with other sockets in the same namespace.
Two sysctl knobs are available since Linux 7.0:
/proc/sys/net/vsock/child_ns_mode: the parent namespace uses this to set the mode that new child namespaces will inherit. Acceptsglobalorlocal./proc/sys/net/vsock/ns_mode: read-only, shows the mode of the current namespace. The mode is immutable after namespace creation.
This design ensures backward compatibility: the default is global, matching
the previous behavior. Namespace isolation is opt-in.
Each namespace gets its mode from the parent’s child_ns_mode at
creation time. Once set, the namespace’s ns_mode is immutable: every
socket and VM in that namespace follows it. Changing child_ns_mode
in the parent only affects future child namespaces, not existing ones.
Supported vsock transports
This series adds namespace support to two transports:
- vhost-vsock: host-to-guest (H2G) transport, emulates the virtio-vsock device for KVM guests
- vsock-loopback: local transport, useful for testing and debugging without running VMs
The missing transports are the guest-to-host (G2H) ones (virtio, hyperv, vmci).
These run in the guest as device drivers, and we currently don’t have a way to
assign a vsock device to a specific namespace, since vsock devices are not
standard network devices. For now, they operate in global mode, so they are reachable from
any global namespace, but not from local namespaces. This means that
sockets in a local namespace cannot communicate with the host through
these transports. We plan to work on that in the future.
Examples
Loopback
In the following examples, the commands without a namespace prefix run in the
initial network namespace (init_netns), which is the default namespace where
all processes start. The init_netns is always in global mode.
These examples use the vsock loopback device for local communication, without any VM involved.
Make sure the vsock_loopback kernel module is loaded:
$ sudo modprobe vsock_loopback
Namespace isolation with unshare
Global mode (default)
By default, child_ns_mode is set to global. This is the same behavior
as before Linux 7.0: vsock sockets are shared across namespaces.
A listener started in a new namespace is reachable from the init_netns
using the loopback CID (VMADDR_CID_LOCAL = 1):
$ echo global | sudo tee /proc/sys/net/vsock/child_ns_mode
$ unshare --user --net nc --vsock -l 1234 &
$ nc --vsock 1 1234
# reachable - global mode, no isolation
Local mode
Setting child_ns_mode to local enables isolation. New namespaces will
have their own vsock space:
$ echo local | sudo tee /proc/sys/net/vsock/child_ns_mode
Now a listener in a new namespace is not reachable from the init_netns:
$ unshare --user --net nc --vsock -l 1234 &
$ nc --vsock 1 1234
Ncat: Connection reset by peer.
Namespace isolation with ip netns
The same can be done with ip netns, which requires root (or CAP_SYS_ADMIN).
First, create a global namespace and check its mode:
$ echo global | sudo tee /proc/sys/net/vsock/child_ns_mode
$ sudo ip netns add vsock_ns_global
$ sudo ip netns exec vsock_ns_global cat /proc/sys/net/vsock/ns_mode
global
A listener in the global namespace is reachable from the init_netns:
$ sudo ip netns exec vsock_ns_global nc --vsock -l 1234 &
$ nc --vsock 1 1234
# reachable - global mode, no isolation
Now create a local namespace and check its mode:
$ echo local | sudo tee /proc/sys/net/vsock/child_ns_mode
$ sudo ip netns add vsock_ns_local
$ sudo ip netns exec vsock_ns_local cat /proc/sys/net/vsock/ns_mode
local
A listener in the local namespace is not reachable from the init_netns:
$ sudo ip netns exec vsock_ns_local nc --vsock -l 1234 &
$ nc --vsock 1 1234
Ncat: Connection reset by peer.
But communication within the same namespace still works:
$ sudo ip netns exec vsock_ns_local nc --vsock 1 1234
# reachable - same namespace
Container isolation with podman
Since podman creates a network namespace for each container by default,
vsock namespace support applies to containers as well.
First, build a Fedora-based image with ncat installed:
$ podman build -t fedora-ncat - <<< "FROM fedora
RUN dnf -y install nmap-ncat"
With the default global mode, two containers share the same vsock space.
A listener in one container is reachable from another global container:
$ echo global | sudo tee /proc/sys/net/vsock/child_ns_mode
$ podman run --rm --init -d fedora-ncat sh -c "echo hello world | nc --vsock -l 1234"
$ podman run --rm --init -it fedora-ncat nc --vsock 1 1234
hello world
With local mode, each container gets its own isolated vsock namespace:
$ echo local | sudo tee /proc/sys/net/vsock/child_ns_mode
$ podman run --rm --init -d fedora-ncat sh -c "echo hello world | nc --vsock -l 1234"
$ podman run --rm --init -it fedora-ncat nc --vsock 1 1234
Ncat: Connection reset by peer.
# containers are isolated from each other
VMs with QEMU
The vhost-vsock H2G transport exposes the /dev/vhost-vsock device, which
QEMU opens at VM startup to emulate the virtio-vsock device for the guest.
Since namespace support applies to this transport, VMs inherit the namespace
mode as well.
In the following examples, we reuse the vsock_ns_global and vsock_ns_local
namespaces created in the previous section.
Global mode
With global mode, the VM started in a global namespace is reachable
from any other global namespace, including the init_netns:
$ sudo ip netns exec vsock_ns_global \
qemu-system-x86_64 -m 1G -M q35,accel=kvm \
-drive file=guest.qcow2,if=virtio,snapshot=on \
-device vhost-vsock-pci,guest-cid=42
# start a listener in the guest (global namespace)
guest_global$ nc --vsock -l 1234
# from the init_netns (global) - reachable
$ nc --vsock 42 1234
Local mode
With local mode, the VM is only reachable from within the same namespace:
$ sudo ip netns exec vsock_ns_local \
qemu-system-x86_64 -m 1G -M q35,accel=kvm \
-drive file=guest.qcow2,if=virtio,snapshot=on \
-device vhost-vsock-pci,guest-cid=42
# start a listener in the guest (local namespace)
guest_local$ nc --vsock -l 1234
# from the init_netns (global) - isolated
$ nc --vsock 42 1234
Ncat: Connection reset by peer.
# from the same namespace - reachable
$ sudo ip netns exec vsock_ns_local nc --vsock 42 1234
Guest-to-host (G2H) behavior
As mentioned in the Supported vsock transports
section, the G2H virtio transport does not support namespaces yet. The
virtio-vsock device in the guest always operates in global mode, so
only sockets in global namespaces can communicate with the host.
Using the VM started in vsock_ns_global, a listener in the guest’s
init_netns is reachable from the host:
# start a listener in the guest
guest_global$ nc --vsock -l 1234
# from the host - reachable
$ nc --vsock 42 1234
But a listener started in a local namespace inside the guest is not
reachable from the host:
# create a local namespace in the guest and start a listener
guest_global$ echo local | sudo tee /proc/sys/net/vsock/child_ns_mode
guest_global$ unshare --user --net nc --vsock -l 1234
# from the host - isolated
$ nc --vsock 42 1234
Ncat: Connection reset by peer.
CID reuse
Note that we used the same CID (42) in both examples without turning off
the first VM. This is possible because the second VM is in a local
namespace, so its CID space is isolated. With global mode, QEMU would
fail to start the second VM because the CID is already in use.