Is a docker container that requires "clone" and "unshare" syscalls of any use?

Çağlar Arlı - 24 Views

Is a docker container that requires "clone" and "unshare" syscalls of any use?

I am not very skilled at linux and besides reading about clone and unshare I would love to have some second opinion on the situation:

The setup

A process I intend to run is an automated "chrome service" that visits sites. These sites may be potentially unsecure. Chrome itself runs in a sandboxed environment, which is quite secure, but there are at least some sources claiming, that it can be escaped.

My intention is to isolate the chrome service and put it inside a docker container and run it with an unpriveleged user (via USER docker instruction). The problem with it: Chrome refuses to run in a unprivileged container. It usually requires --cap-add=SYS_ADMIN to be able to create its sandbox.

I spent a lot of time understanding dockers seccomp (and realized that docker's default seccomp profile on WSL is way more permissive [WTF], than on a 'real' linux vm). (Side-track: I downloaded the default seccomp and verified, that WSL now behaves much stricter)

So I know, I could use a custom seccomp profile, to make chrome work inside a container.

The problem

What I have figured out: The chromium process requires at least two syscalls (enabled with SYS_ADMIN), which are not enabled in the default seccomp, to create its sandbox:

If I understand the man pages correctly (and asking Chat GPT some questions ;) ), then giving these permissions to a container is actually not a good idea, because a bad container can actually escape namespaces?

The options

What I cannot assess: What is the impact in my situation? I want to isolate the chrome service, but I have to give the container the privileges to potentially escape it?

I could spawn a dedicated VM to run the chrome-service without any containers, only the chrome process as an uncontainerized process, but the handling of VMs is more complicated than the handling of containers. The upside would be, that the impact would be contained to a VM, that has no other containers or services running.

I could also try to run the chrome service with --no-sandbox but not giving it privileged syscall permissions. Then the docker container itself would be 'insecure' but the impact would be at least contained in the container? (Unless a vulnerability in the container allows escaping to the host)

Any point of views on that? I know, for critical infrastructure the solution would probably be something like a hardened vm linux with a whitelist for chrome urls and the VM gets deleted and recreated on every request.

But I lack the data to decide between the less-secure-but-more-practical options like

"how likely a bad website can escape the chrome sandbox and detect that it is inside a docker container and utilize clone and unshare to compromise a host?"

"what damage can a bad website cause to a non-sandboxed chrome process, but being stuck in a container with docker's isolation?"

P	S	Ç	P	C	C	P
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Annanowa

Is a docker container that requires "clone" and "unshare" syscalls of any use?