Cipher benchmark for dm-crypt / LUKS
Do you have a netbook, laptop, desktop or a server which uses dm-crypt to encrypt data on your disks? If yes, you will probably find that raw hard disk performance is better than encrypted disk performance. You will notice that especially on slow machines (i.e. netbooks), but also high-performance servers, because of the current dm-crypt design.
What cipher in the Linux kernel provides you with the best performance?
Currently, dm-crypt in the Linux kernel suffers from at least one performance-wise flaw: it is not SMP aware. This means, even if you have several CPUs in your machine, only one processor will be used to encrypt/decrypt data (ed. 31-May-2010: there was a patch posted today to make dm-crypt scale to multiple CPUs).
With moderately fast disks and RAID arrays in a server, you will hit a a limit where one processor is not able to encrypt/decrypt data fast enough. With netbooks and slow CPUs, and probably fast SSD disks, you will hit this limit even earlier.
Here is a list of different ciphers and throughput they delivered, when reading from a given device linearly.
The tests were made on a Celeron 2.93GHz CPU with Seagate Barracuda 7200.11 SATA 3Gb/s 1.5-TB ST31500341AS disks. Raw linear speed of these disks was about 105 MB/s.
What performance could different ciphers deliver on this machine? Note that you have to consider security implications / encryption strength yourself when using custom encryption schemes (i.e. using -essiv instead of -plain on similar hardware will usually decrease the performance by about 10 MB/s, but your encryption should be “harder to crack”).
Default parameters for creating an encrypted device are:
cryptsetup luksFormat /dev/$DEVICE
You can add options like:
cryptsetup luksFormat -c cast5-cbc-plain -s 128 /dev/$DEVICE
To open an encrypted device:
cryptsetup luksOpen /dev/$DEVICE $SOMENAME
You will have a new block device in /dev/mapper/$SOMENAME, which you can i.e. use for a filesystem
To close the encrypted device:
cryptsetup luksClose $SOMENAME
Below, the results:
-c tnepres 20.1 MB/s
-c serpent 20.4 MB/s
-c seed-ecb-plain -s 256 20.5 MB/s
-c fcrypt-pcbc-plain -s 64 30.4 MB/s
-c khazad-ecb-plain -s 128 31.7 MB/s
-c xtea-ecb-plain -s 128 32.0 MB/s
-c arc4 32.1 MB/s
-c xeta-ecb-plain -s 128 32.1 MB/s
-c twofish 34.2 MB/s
-c anubis-cbc-plain -s 256 37.5 MB/s
-c anubis -s 256 37.8 MB/s
-c tea-ecb-plain -s 128 38.1 MB/s
-c anubis-ecb-plain -s 256 39.6 MB/s
-c cast6-cbc-plain -s 256 40.0 MB/s
-c cast6 40.7 MB/s
-c des-ecb-plain -s 64 42.0 MB/s
-c camellia -s 256 42.2 MB/s
-c anubis -s 128 46.4 MB/s
-c anubis-cbc-plain -s 128 47.5 MB/s
-c anubis-ecb-plain -s 128 49.4 MB/s
-c cast5-cbc-plain -s 128 50.2 MB/s
-c camellia -s 128 51.4 MB/s
-c aes -s 256 55.9 MB/s
-c aes-cbc-plain -s 256 56.4 MB/s
-c aes-cbc-benbi -s 256 56.7 MB/s
-c aes-cbc-null -s 256 57.0 MB/s
-c blowfish 57.2 MB/s
-c aes-ecb-benbi -s 256 58.8 MB/s
-c aes-ecb-null -s 256 59.5 MB/s
-c aes-ecb-plain -s 256 60.3 MB/s
-c blowfish-ecb-plain 61.4 MB/s
-c aes-xts-plain -s 256 61.6 MB/s
-c aes-lrw-plain -s 256 62.8 MB/s
-c aes-cbc-plain -s 128 66.8 MB/s
-c aes-ctr-plain -s 128 67.0 MB/s
-c aes-cbc-null -s 128 67.1 MB/s
-c aes-cbc-benbi -s 128 67.4 MB/s
-c aes -s 128 67.5 MB/s
-c aes-ecb-plain -s 128 71.0 MB/s
-c aes-ecb-benbi -s 128 71.2 MB/s
-c aes-ecb-null -s 128 71.5 MB/s
The benchmarks were made with dd (bs=64k, 3 GB read), repeated several times; caches were dropped before each test.
I am missing ESSIV mode (with AES for instance). Did you not covered it by intention?
I was looking for the fastest mode for my machine.
-essiv on the above hardware (2.9 GHz Celeron) performed about 10 MB/s slower than -plain (you will find this note in the article as well).
My results on a Intel Atom 330 with loopback device on RAM. The script can be downloaded fom http://www.holtznet.de/luks/
Create options write read
-c aes-cbc-essiv:sha256 -s 128 29.2 MB/s 31.7 MB/s
-c aes-xts-plain -s 128 28.3 MB/s 31.7 MB/s
-c aes-cbc-essiv:sha256 -s 196 28.1 MB/s 32.3 MB/s
-c aes-xts-plain -s 196 28.6 MB/s 31.9 MB/s
-c aes-cbc-essiv:sha256 -s 256 23.2 MB/s 25.1 MB/s
-c aes-xts-plain -s 256 31.2 MB/s 33.9 MB/s
-c arc4-cbc-essiv:sha256 -s arc4 30.2 MB/s 33.6 MB/s
-c arc4-xts-plain -s arc4 31.8 MB/s 34.4 MB/s
-c des-cbc-essiv:sha256 -s 128 31.0 MB/s 34.0 MB/s
-c des-xts-plain -s 128 31.5 MB/s 33.8 MB/s
-c des-cbc-essiv:sha256 -s 256 32.1 MB/s 34.0 MB/s
-c des-xts-plain -s 256 30.4 MB/s 34.2 MB/s
-c blowfish-cbc-essiv:sha256 -s 128 22.8 MB/s 27.5 MB/s
-c blowfish-xts-plain -s 128 23.1 MB/s 27.1 MB/s
-c blowfish-cbc-essiv:sha256 -s 196 22.8 MB/s 27.6 MB/s
-c blowfish-xts-plain -s 196 23.0 MB/s 27.2 MB/s
-c blowfish-cbc-essiv:sha256 -s 256 23.4 MB/s 27.6 MB/s
-c blowfish-xts-plain -s 256 22.7 MB/s 27.4 MB/s
-c anubis-cbc-essiv:sha256 -s 128 20.6 MB/s 22.6 MB/s
-c anubis-xts-plain -s 128 20.9 MB/s 22.9 MB/s
-c anubis-cbc-essiv:sha256 -s 256 17.1 MB/s 18.5 MB/s
-c anubis-xts-plain -s 256 21.9 MB/s 23.6 MB/s
-c cast5-cbc-essiv:sha256 -s 128 22.5 MB/s 23.5 MB/s
-c cast5-xts-plain -s 128 21.8 MB/s 23.6 MB/s
-c camellia-cbc-essiv:sha256 -s 128 20.6 MB/s 19.2 MB/s
-c camellia-xts-plain -s 128 20.4 MB/s 19.2 MB/s
-c camellia-cbc-essiv:sha256 -s 196 20.7 MB/s 19.5 MB/s
-c camellia-xts-plain -s 196 20.9 MB/s 19.4 MB/s
-c camellia-cbc-essiv:sha256 -s 256 16.6 MB/s 15.2 MB/s
-c camellia-xts-plain -s 256 22.2 MB/s 20.0 MB/s
-c twofish-cbc-essiv:sha256 -s 128 22.9 MB/s 26.2 MB/s
-c twofish-xts-plain -s 128 23.2 MB/s 26.2 MB/s
-c twofish-cbc-essiv:sha256 -s 196 23.6 MB/s 26.5 MB/s
-c twofish-xts-plain -s 196 23.3 MB/s 26.4 MB/s
-c twofish-cbc-essiv:sha256 -s 256 23.5 MB/s 26.6 MB/s
-c twofish-xts-plain -s 256 24.7 MB/s 27.1 MB/s
-c salsa20-cbc-essiv:sha256 -s 128 24.5 MB/s 27.6 MB/s
-c salsa20-xts-plain -s 128 24.8 MB/s 27.5 MB/s
-c salsa20-cbc-essiv:sha256 -s 160 25.3 MB/s 27.5 MB/s
-c salsa20-xts-plain -s 160 25.0 MB/s 27.5 MB/s
-c salsa20-cbc-essiv:sha256 -s 196 24.5 MB/s 27.7 MB/s
-c salsa20-xts-plain -s 196 24.7 MB/s 27.5 MB/s
-c salsa20-cbc-essiv:sha256 -s 256 24.7 MB/s 27.3 MB/s
-c salsa20-xts-plain -s 256 24.9 MB/s 27.3 MB/s
-c serpent-cbc-essiv:sha256 -s 128 22.3 MB/s 25.2 MB/s
-c serpent-xts-plain -s 128 23.0 MB/s 25.6 MB/s
-c serpent-cbc-essiv:sha256 -s 196 23.5 MB/s 26.2 MB/s
-c serpent-xts-plain -s 196 24.3 MB/s 26.2 MB/s
-c serpent-cbc-essiv:sha256 -s 256 22.9 MB/s 24.5 MB/s
-c serpent-xts-plain -s 256 23.9 MB/s 26.1 MB/s
@Frank: “-c aes-xts-plain -s 128″ can’t work though. The key needs to be at least 256 bit long. The kernel Kconfig help suggests that for XTS you need to double the key size (so aes-xts-plain: 256/384/512 bits). That’s neccessary because one part of the key is used by XTS and the other by AES. It’s also strange that the bigger the keysize gets the more read/write throughput you get.
Seems a flawed post, the most important benchmark was missing and thats the one with the default options (without -c set), so now I am sitting here wondering where does the default option sit in that league table.
@Chris: The default mode is aes-cbc-essiv:sha256 for LUKS with a 128-bit key, it’s documented in the cryptsetup man page.
To print a table with 3 digit precision and then say that the default (essiv) will be “about 10 MB/s” slower than something somewhere in the table makes no sense. If anything is worth measuring precisely for comparison’s sake, it’s the default. And if anything is worth listing explicitly in the table, it’s the default.
Another topic: Multi-core CPU measurements would be nice.
Jim,
it’s about “10 MB/s slower on similar hardware” and was consistent with my tests.
Multi-core CPU measurement would make no difference at all, as dm-crypt is not SMP-capable.
I don’t doubt that you measured 10 MB/s difference, but that’s 1 digit precision. You listed everything else with 3 digits, so this should be too.
You have an authoritative source that it is not able to use multiple cores? Because Fruwith even fought with the kernel devel guys to restructure the kernel so that luks could run LRW (hence parallel). And that was 5 years ago.
Well, you’ll find some more tests by Frank, if you’re interested in more results.
You can always make tests yourself if you want to know how it behaves on your hardware.
dm-crypt does not use more than one core; I gave a link to a discussion with crypt maintainers on dm-devel list, which is roughly one year old. Nothing changed here, AFAIK.
[...] [...]
I tested about 1200 combinations of cipher, cipher mode, iv hash, and key length based on the available blkciphers, hashes, and modes in /proc/crypto on an OpenSuSE 11.2 system. Each test was run ten times on a one gig, memory backed loopback device. I skipped a number of irrelevant configurations (tnepres, xeta, arc4, essiv hashes for xts modes, etc.) but a few weird setups still make the list (michael_mic for an essiv hash and every possible key length for blowfish, for example.)
All tests were run on a system with a single dual core Xeon 5160 processor and 4 GB of RAM. Caches were dropped before each test.
http://skroz-www.s3.amazonaws.com/report.csv
I think I see the problem with Frank’s benchmark results.
In his test script, he runs “cryptsetup … luksFormat …” without checking for an error. If a previous iteration left a valid luks device AND the failed luksFormat left loop0 unmodified, the subsequent luksOpen would succeed and the benchmark would run for the previous cipher-mode-iv combination. This appears to explain why there are valid results for impossible modes such as “-c aes-xts-plain -s 128″ (128 is an invalid blocksize for aes-xts), “-c blowfish-xts-X” and “cast5-xts-X” (blowfish and cast5 have 8 byte block size and won’t work in XTS mode at all), anything with salsa20 (salsa20 is a stream cipher), and “-c arc4-xts-plain -s arc4″ (interesting block size, that.)