Data Persistence#
User Data#
The user servers are started using JupyterHub’s SystemUserSpawner.
This spawner is based on the DockerSpawner, but makes it possible to use the host users to start the notebook servers.
Concretely this means that the user inside the container corresponds to a real user that exists on the host.
Processes will be started by that user, instead of the default jovyan
user that is usually found in the regular
Jupyter Docker images and on Binder.
For example when the user foo
starts their server, the list of processes looks like the following:
foo@9cf23d669647:~$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 1.1 0.0 50944 3408 ? Ss 11:17 0:00 su - foo -m -c "$0" "$@" -- /srv/conda/envs/notebook/bin/jupyterhub-singleuser --ip=0.0.0.0 --port=8888 --NotebookApp.default_url=/lab --ResourceUseDisplay.track_cpu_percent=True
foo 32 5.4 0.8 399044 70528 ? Ssl 11:17 0:01 /srv/conda/envs/notebook/bin/python /srv/conda/envs/notebook/bin/jupyterhub-singleuser --ip=0.0.0.0 --port=8888 --NotebookApp.default_url=/lab --ResourceUseDisplay.track_cpu_percent=True
foo 84 0.0 0.0 20312 4036 pts/0 Ss 11:17 0:00 /bin/bash -l
foo 112 29.0 0.5 458560 46448 ? Ssl 11:17 0:00 /srv/conda/envs/notebook/bin/python -m bash_kernel -f /home/foo/.local/share/jupyter/runtime/kernel-9a7c8ad3-4ac2-4754-88cc-ef746d1be83e.json
foo 126 0.5 0.0 20180 3884 pts/1 Ss+ 11:17 0:00 /bin/bash --rcfile /srv/conda/envs/notebook/lib/python3.8/site-packages/pexpect/bashrc.sh
foo 140 0.0 0.0 36076 3368 pts/0 R+ 11:17 0:00 ps aux
The following steps happen when a user starts their server:
Mount the user home directory on the host into the container. This means that the file structure in the container reflects what is on the host.
A new directory is created in the user home directory for each new environment (i.e for each Docker image). For example if a user starts the
2020-python-course
environment, there will be a new folder created under/home/user/2020-python-course
. This folder is then persisted to disk in the user home directory on the host. Any file and notebook created from the notebook interface are also persisted to disk.On server startup, the entrypoint script copies the files from the base image that are initially in
/home/jovyan
to/home/user/2020-python-course
in the container. They are then persisted in/home/user/2020-python-course
on the host.
The files highlighted in blue correspond to the files initially bundled in the environment. These files are copied to the environment subdirectory in the user home directory on startup.
The other files are examples of files created by the user.
User server startup#
The user server is started from the environment directory:
The rest of the user files are mounted into the container, see User Data.
A user can for example open a terminal and access their files by typing cd
.
They can then inspect their files:
foo@3e29b2297563:/home/foo$ ls -lisah
total 56K
262882 4.0K drwxr-xr-x 9 foo foo 4.0K Apr 21 16:53 .
6205024 4.0K drwxr-xr-x 1 root root 4.0K Apr 21 16:50 ..
266730 4.0K -rw------- 1 foo foo 228 Apr 21 14:41 .bash_history
262927 4.0K -rw-r--r-- 1 foo foo 220 May 5 2019 .bash_logout
262928 4.0K -rw-r--r-- 1 foo foo 3.7K May 5 2019 .bashrc
1043206 4.0K drwx------ 3 foo foo 4.0K Apr 21 09:26 .cache
528378 4.0K drwx------ 3 foo foo 4.0K Apr 17 17:36 .gnupg
1565895 4.0K drwxrwxr-x 2 foo foo 4.0K Apr 21 09:55 .ipynb_checkpoints
1565898 4.0K drwxr-xr-x 5 foo foo 4.0K Apr 21 09:27 .ipython
1565880 4.0K drwxrwxr-x 3 foo foo 4.0K Apr 21 09:26 .local
262926 4.0K -rw-r--r-- 1 foo foo 807 May 5 2019 .profile
1050223 4.0K drwxrwxr-x 12 foo foo 4.0K Apr 20 10:44 2020-python-course
1043222 4.0K drwxrwxr-x 13 foo foo 4.0K Apr 20 17:07 r-intro
258193 4.0K -rw-rw-r-- 1 foo foo 843 Apr 21 09:56 Untitled.ipynb