Docker 内存资源限制

1. 压测工具

通过如下 Dockerfile 构建简单的测试镜像

➜  cat Dockerfile
FROM ubuntu:latest

RUN apt-get update && \
    apt-get install stress
➜   docker build -t ubuntu-stress:latest .

2. 内存测试

  • Runtime constraints on resources
  • 目前 Docker 支持内存资源限制选项
    • -m--memory=""
      • Memory limit (format: <number>[<unit>] ). Number is a positive integer. Unit can be one of b , k , m , or g . Minimum is 4M.
    • --memory-swap=""
      • Total memory limit (memory + swap, format: <number>[<unit>] ). Number is a positive integer. Unit can be one of b , k , m , or g .
    • --memory-swappiness=""
      • Tune a container's memory swappiness behavior. Accepts an integer between 0 and 100.
    • --shm-size=""
      • Size of /dev/shm . The format is <number><unit> . number must be greater than 0. Unit is optional and can be b (bytes), k (kilobytes), m (megabytes), or g (gigabytes). If you omit the unit, the system uses bytes. If you omit the size entirely, the system uses 64m .
      • 根据实际需求设置,这里不作过多的介绍
    • --memory-reservation=""
      • Memory soft limit (format: <number>[<unit>] ). Number is a positive integer. Unit can be one of b , k , m , or g .
    • --kernel-memory=""
      • Kernel memory limit (format: <number>[<unit>] ). Number is a positive integer. Unit can be one of b , k , m , or g . Minimum is 4M.
      • kernel memory 没有特殊需求,则无需额外设置
    • --oom-kill-disable=false
      • Whether to disable OOM Killer for the container or not.

默认启动一个 container,对于容器的内存是没有任何限制的。

➜  ~ docker help run | grep memory  # 测试 docker 版本 1.10.2,宿主系统 Ubuntu 14.04.1
  --kernel-memory                 Kernel memory limit
  -m, --memory                    Memory limit
  --memory-reservation            Memory soft limit
  --memory-swap                   Swap limit equal to memory plus swap: '-1' to enable unlimited swap
  --memory-swappiness=-1          Tune container memory swappiness (0 to 100)
➜  ~

2.1. -m ... --memory-swap ...

指定限制内存大小并且设置 memory-swap 值为 -1,表示容器程序使用内存受限,而 swap 空间使用不受限制(宿主 swap 支持使用多少则容器即可使用多少。如果 --memory-swap 设置小于 --memory 则设置不生效,使用默认设置)。

➜  ~ docker run -it --rm -m 100M --memory-swap -1 ubuntu-stress:latest /bin/bash
root@4b61f98e787d:/# stress --vm 1 --vm-bytes 1000M  # 通过 stress 工具对容器内存做压测
stress: info: [14] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd

使用 docker stats 查看当前容器内存资源使用:

➜  ~ docker stats 4b61f98e787d
CONTAINER           CPU %               MEM USAGE/LIMIT     MEM %               NET I/O
4b61f98e787d        6.74%               104.8 MB/104.9 MB   99.94%              4.625 kB/648 B

通过 top 实时监控 stress 进程内存占用:

➜  ~ pgrep stress
8209
8210    # 需查看 stress 子进程占用,
➜  ~ top -p 8210    # 显示可以得知 stress 的 RES 占用为 100m,而 VIRT 占用为 1007m
top - 17:51:31 up 35 min,  2 users,  load average: 1.14, 1.11, 1.06
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  3.1 sy,  0.0 ni, 74.8 id, 21.9 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8102564 total,  6397064 used,  1705500 free,   182864 buffers
KiB Swap: 15625212 total,  1030028 used, 14595184 free.  4113952 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 8210 root      20   0 1007.1m 100.3m   0.6m D  13.1  1.3   0:22.59 stress

也可以通过如下命令获取 stress 进程的 swap 占用:

➜  ~ for file in /proc/*/status ; do awk '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 2 -n -r | grep stress
stress 921716 kB
stress 96 kB
➜  ~

按照官方文档的理解,如果指定 -m 内存限制时不添加 --memory-swap 选项,则表示容器中程序可以使用 100M 内存和 100M swap 内存。默认情况下, --memory-swap 会被设置成 memory 的 2 倍。

We set memory limit(300M) only, this means the processes in the container can use 300M memory and 300M swap memory, by default, the total virtual memory size --memory-swap will be set as double of memory, in this case, memory + swap would be 2*300M, so processes can use 300M swap memory as well.

如果按照以上方式运行容器提示如下信息:

WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.

可参考 Adjust memory and swap accounting 获取解决方案:

  • To enable memory and swap on system using GNU GRUB (GNU GRand Unified Bootloader), do the following:
    • Log into Ubuntu as a user with sudo privileges.
    • Edit the /etc/default/grub file.
    • Set the GRUB_CMDLINE_LINUX value as follows:
      • GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
    • Save and close the file.
    • Update GRUB.
      • $ sudo update-grub
    • Reboot your system.
➜  ~ docker run -it --rm -m 100M ubuntu-stress:latest /bin/bash
root@ed670cdcb472:/# stress --vm 1 --vm-bytes 200M # 压测 200M,stress 进程会被立即 kill 掉
stress: info: [17] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [17] (416) <-- worker 18 got signal 9
stress: WARN: [17] (418) now reaping child worker processes
stress: FAIL: [17] (452) failed run completed in 2s
root@ed670cdcb472:/# stress --vm 1 --vm-bytes 199M

docker stats 和 top 获取资源占用情况:

➜  ~ docker stats ed670cdcb472
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O
ed670cdcb472        13.35%              104.3 MB / 104.9 MB   99.48%              6.163 kB / 648 B    26.23 GB / 29.21 GB
➜  ~ pgrep stress
16322
16323
➜  ~ top -p 16323
top - 18:12:31 up 56 min,  2 users,  load average: 1.07, 1.07, 1.05
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.8 us,  4.0 sy,  0.0 ni, 69.6 id, 21.4 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem:   8102564 total,  6403040 used,  1699524 free,   184124 buffers
KiB Swap: 15625212 total,   149996 used, 15475216 free.  4110440 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16323 root      20   0  206.1m  91.5m   0.6m D   9.9  1.2   0:52.58 stress
➜  ~ docker run -it --rm -m 100M --memory-swap 400M ubuntu-stress:latest /bin/bash
root@5ed1fd88a1aa:/# stress --vm 1 --vm-bytes 400M  # 压测到 400M 程序会被 kill
stress: info: [24] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [24] (416) <-- worker 25 got signal 9
stress: WARN: [24] (418) now reaping child worker processes
stress: FAIL: [24] (452) failed run completed in 3s
root@5ed1fd88a1aa:/# stress --vm 1 --vm-bytes 399M  # 压测到 399M 程序刚好可以正常运行(这个值已经处于临界了,不保证不被 kill)

docker stats 和 top 获取资源占用情况:

➜  ~ docker stats 5ed1fd88a1aa
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O
5ed                 12.44%              104.8 MB / 104.9 MB   99.92%              4.861 kB / 648 B    9.138 GB / 10.16 GB
➜  ~ pgrep stress
22721
22722
➜  ~ top -p 22722
top - 18:18:58 up  1:02,  2 users,  load average: 1.04, 1.04, 1.05
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.4 us,  3.3 sy,  0.0 ni, 73.7 id, 21.6 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem:   8102564 total,  6397416 used,  1705148 free,   184608 buffers
KiB Swap: 15625212 total,   366160 used, 15259052 free.  4102076 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
22722 root      20   0  406.1m  84.1m   0.7m D  11.7  1.1   0:08.82 stress

根据实际测试可以理解, -m 为物理内存上限,而 --memory-swap 则是 memory + swap 之和,当压测值是 --memory-swap 上限时,则容器中的进程会被直接 OOM kill。

2.2. -m ... --memory-swappiness ...

swappiness 可以认为是宿主 /proc/sys/vm/swappiness 设定:

Swappiness is a Linux kernel parameter that controls the relative weight given to swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100 inclusive. A low value causes the kernel to avoid swapping, a higher value causes the kernel to try to use swap space. Swappiness

--memory-swappiness=0 表示禁用容器 swap 功能(这点不同于宿主机,宿主机 swappiness 设置为 0 也不保证 swap 不会被使用):

➜  ~ docker run -it --rm -m 100M --memory-swappiness=0 ubuntu-stress:latest /bin/bash
root@e3fd6cc73849:/# stress --vm 1 --vm-bytes 100M  # 没有任何商量的余地,到达 100M 直接被 kill
stress: info: [18] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [18] (416) <-- worker 19 got signal 9
stress: WARN: [18] (418) now reaping child worker processes
stress: FAIL: [18] (452) failed run completed in 0s
root@e3fd6cc73849:/#

2.3. --memory-reservation ...

--memory-reservation ... 选项可以理解为内存的软限制。如果不设置 -m 选项,那么容器使用内存可以理解为是不受限的。按照官方的说法,memory reservation 设置可以确保容器不会长时间占用大量内存。

2.4. --oom-kill-disable

➜  ~ docker run -it --rm -m 100M --memory-swappiness=0 --oom-kill-disable ubuntu-stress:latest /bin/bash
root@f54f93440a04:/# stress --vm 1 --vm-bytes 200M  # 正常情况不添加 --oom-kill-disable 则会直接 OOM kill,加上之后则达到限制内存之后也不会被 kill
stress: info: [17] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd

但是如果是以下的这种没有对容器作任何资源限制的情况,添加 --oom-kill-disable 选项就比较 危险 了:

$ docker run -it --oom-kill-disable ubuntu:14.04 /bin/bash

因为此时容器内存没有限制,并且不会被 oom kill,此时系统则会 kill 系统进程用于释放内存。

2.5. --kernel-memory

Kernel memory is fundamentally different than user memory as kernel memory can't be swapped out. The inability to swap makes it possible for the container to block system services by consuming too much kernel memory. Kernel memory includes:

  • stack pages
  • slab pages
  • sockets memory pressure
  • tcp memory pressure

这里直接引用 Docker 官方介绍,如果无特殊需求,kernel-memory 一般无需设置,这里不作过多说明。

3. 内存资源限制 Docker 源码解析

关于 Docker 资源限制主要是依赖 Linux cgroups 去实现的,关于 cgroups 资源限制实现可以参考:Docker背后的内核知识-cgroups资源限制, libcontainer 配置相关的选项:

  • github.com/opencontainers/runc/libcontainer/cgroups/fs/memory.go
68 func (s *MemoryGroup) Set(path string, cgroup *configs.Cgroup) error {
69     if cgroup.Resources.Memory != 0 {
70         if err := writeFile(path, "memory.limit_in_bytes", strconv.FormatInt(cgroup.Resources.Memory, 10)); err != nil {
71             return err
72         }
73     }
74     if cgroup.Resources.MemoryReservation != 0 {
75         if err := writeFile(path, "memory.soft_limit_in_bytes", strconv.FormatInt(cgroup.Resources.MemoryReservation, 10)); err != nil {
76             return err
77         }
78     }
79     if cgroup.Resources.MemorySwap > 0 {
80         if err := writeFile(path, "memory.memsw.limit_in_bytes", strconv.FormatInt(cgroup.Resources.MemorySwap, 10)); err != nil {
81             return err   // 如果 MemorySwap 没有设置,则 cgroup 默认设定值是 Memory 2 倍,详见后文测试
82         }
83     }
84     if cgroup.Resources.OomKillDisable {
85         if err := writeFile(path, "memory.oom_control", "1"); err != nil {
86             return err
87         }
88     }
89     if cgroup.Resources.MemorySwappiness >= 0 && cgroup.Resources.MemorySwappiness <= 100 {
90         if err := writeFile(path, "memory.swappiness", strconv.FormatInt(cgroup.Resources.MemorySwappiness, 10)); err != nil {
91             return err
92         }
93     } else if cgroup.Resources.MemorySwappiness == -1 {
94         return nil  // 如果 MemorySwappiness 设置为 -1,则不做任何操作,经测试默认值为 60,后文附测试
95     } else {
96         return fmt.Errorf("invalid value:%d. valid memory swappiness range is 0-100", cgroup.Resources.MemorySwappiness)
97     }
98
99     return nil
100 }

附测试:

➜  ~ docker run -it --rm -m 100M --memory-swappiness=-1 ubuntu-stress:latest /bin/bash
root@fbe9b0abf665:/#

查看宿主对应 container cgroup 对应值:

➜  ~ cd /sys/fs/cgroup/memory/docker/fbe9b0abf665b77fff985fd04f85402eae83eb7eb7162a30070b5920d50c5356
➜  fbe9b0abf665b77fff985fd04f85402eae83eb7eb7162a30070b5920d50c5356 cat memory.swappiness           # swappiness 如果设置 -1 则该值默认为 60
60
➜  fbe9b0abf665b77fff985fd04f85402eae83eb7eb7162a30070b5920d50c5356 cat memory.memsw.limit_in_bytes # 为设置的 memory 2 倍
209715200
➜  fbe9b0abf665b77fff985fd04f85402eae83eb7eb7162a30070b5920d50c5356
Kumu / 2016-03-08 Tue 00:00Emacs 28.2 (Org mode 9.5.5)