Friday, March 11, 2011

Understanding Linux buffers/cached (part I)

For a while now we've been trying to have a better understanding of the Linux buffer/page cache. In recent kernel versions we have noticed on our database servers an increase in kswapd CPU usage and have been trying to understand where it comes from.

As a first step we wanted to get a better of what kind of disk activity make use of the buffer/page cache. This utilisation can be seen by running:
# free -m
total used free shared buffers cached
Mem: 32240 123 32116 0 0 3
-/+ buffers/cache: 119 32120
Swap: 9546 21 9524
In this case the server has 32G RAM, and practically no buffers and no cache usage. Kernel is 2.6.32.6.

The kernel
/proc documentation has the following to say about buffers and cached:
710      Buffers: Relatively temporary storage for raw disk blocks
711 shouldn't get tremendously large (20MB or so)
712 Cached: in-memory cache for files read from the disk (the
713 pagecache). Doesn't include SwapCached
We have performed various tests using dd to try and verify this. We used dd to copy over a chunk of 16G to/from various sources/destinations:
  • virtual devices (/dev/zero, /dev/null)
  • real devices (/dev/sdb (SATA), /dev/sdc (SAS))
  • files on a filesystem (/mnt/sdc1/file.in (xfs), /mnt/sdb1/file.out (ext3))
For now I'm using 16G chunks because I want to make sure that no cleanup is required in the caches (i.e both input an output can fit in memory concurrently). Before each test I cleared the Linux caches using the following command
# echo 3 > /proc/sys/vm/drop_caches
to ensure that any cache contents during or after the test were in fact put there by the test. I also ran free every 10 seconds during each test and captured the first and last outputs to demonstrate the cache usage change.

Here are the tests results:
1- dd if=/dev/zero of=/dev/null bs=1024k count=16384
Mem: 33014136 129440 32884696 0 296 4044
Mem: 33014136 132016 32882120 0 352 4784
17179869184 bytes (17 GB) copied, 2.33022 s, 7.4 GB/s

2- dd if=/dev/zero of=/dev/sdb2 bs=1024k count=16384
Mem: 33014136 138268 32875868 0 6428 4220
Mem: 33014136 18763244 14250892 0 16777668 4712
17179869184 bytes (17 GB) copied, 174.92 s, 98.2 MB/s

3- dd if=/dev/zero of=/mnt/sdb1/file.out bs=1024k count=16384
Mem: 33014136 126256 32887880 0 104 4308
Mem: 33014136 17297180 15716956 0 17668 4528
17179869184 bytes (17 GB) copied, 201.586 s, 85.2 MB/s

4- dd if=/dev/sdc2 of=/dev/null bs=1024k count=16384
Mem: 33014136 968960 32045176 0 1976 4424
Mem: 33014136 20255872 12758264 0 15836544 814648
17179869184 bytes (17 GB) copied, 93.7654 s, 183 MB/s

5- dd if=/dev/sdc2 of=/dev/sdb2 bs=1024k count=16384
Mem: 33014136 128228 32885908 0 192 4616
Mem: 33014136 32944064 70072 0 28176400 4524
17179869184 bytes (17 GB) copied, 170.546 s, 101 MB/s

6- dd if=/dev/sdc2 of=/mnt/sdb1/file.out bs=1024k count=16384
Mem: 33014136 129748 32884388 0 120 4260
Mem: 33014136 32946356 67780 0 14586200 14619404
17179869184 bytes (17 GB) copied, 207.602 s, 82.8 MB/s

7- dd if=/mnt/sdc1/file.in of=/dev/null bs=1024k count=16384
Mem: 33014136 132580 32881556 0 136 4780
Mem: 33014136 16410388 16603748 0 212 16253484
17179869184 bytes (17 GB) copied, 82.7158 s, 208 MB/s

8- dd if=/mnt/sdc1/file.in of=/dev/sdb2 bs=1024k count=16384
Mem: 33014136 126680 32887456 0 232 4040
Mem: 33014136 32941804 72332 0 15528188 15531504
17179869184 bytes (17 GB) copied, 171.806 s, 100 MB/s

9- dd if=/mnt/sdc1/file.in of=/mnt/sdb1/file.out bs=1024k count=16384
Mem: 33014136 130808 32883328 0 128 4600
Mem: 33014136 32944700 69436 0 18676 32287016
17179869184 bytes (17 GB) copied, 204.688 s, 83.9 MB/s
Here's what I conclude from these results:
  • I/O to/from virtual devices doesn't really impact buffer/page cache;
  • I/O to/from real devices uses the buffer cache;
  • I/O to/from files uses the page cache (maybe buffers as well but they are freed?).
Other observations:
  • Writing to /dev/sdb2: 100MB/s (expected);
  • Writing to the /mnt/sdb1 ext3 file system: 82-85MB/s (expected to be slower that raw device).
  • Reading from /dev/sdc2 slower than reading from /mnt/sdc1 xfs filesystem?
In my next post I'll use 32G chunks to see how the different caches "compete" against each other and how cache cleanup (kswapd) affects performance.












No comments:

Post a Comment