środa, marca 28, 2018

Hot to abuse NVDIMM support in Java 10

Java 10 allows for memory allocation in a place different than plain RAM. The idea was to support Intel Optane and similar things, but we can abuse that feature to run Java applications in compressed memory or in filesystem. Let's play with something small.

Prerequisites:
root@user-Aspire-ES1-431:~# modinfo zram
filename:       /lib/modules/4.14.20-041420-generic/kernel/drivers/block/zram/zram.ko
description:    Compressed RAM Block Device
author:         Nitin Gupta
license:        Dual BSD/GPL
srcversion:     98BCA9B63810EF5DB65BF87
depends:     
retpoline:      Y
intree:         Y
name:           zram
vermagic:       4.14.20-041420-generic SMP mod_unload
signat:         PKCS#7
signer:       
sig_key:     
sig_hashalgo:   md4
parm:           num_devices:Number of pre-created zram devices (uint)
root@user-Aspire-ES1-431:~# modprobe zram num_devices=4
root@user-Aspire-ES1-431:~# echo 1G > /sys/block/zram0/disksize
[ 2562.326469] zram0: detected capacity change from 0 to 1073741824
root@user-Aspire-ES1-431:~# mkfs.ext4 /dev/zram0; mount /dev/zram0 /cmem
mke2fs 1.43.5 (04-Aug-2017)
Discarding device blocks: done                         
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: f8ddae16-ebd4-4615-80c3-8cf60a448526
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Allocating group tables: done                         
Writing inode tables: done                         
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

We see here that even with filesystem and compression overhead solution AllocateHeapAt zram isn't very slow. We can put Java EE into zram and fit the same set of applications using one node instead of two. Very tempting. Maximum heap size is set to 512MB (fake NV) and we still use RAM.



And now killer feature: run Java application in filesystem (/tmp). Why it is not slow? Linux has got very good filesystem cache.


[pid 31442] openat(AT_FDCWD, "/cmem/storage//jvmheap.d3tNHS", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
[pid 31442] unlink("/cmem/storage//jvmheap.d3tNHS" 
[pid 31442] fallocate(4, 0, 0, 536870912 
[pid 31442] mmap(0xe0000000, 536870912, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 4, 0) = 0xe0000000
[pid 31442] close(4)                    = 0

If the name passed to unlink was the last link to a file but any processes still have the file open, the file will remain in existence until the last file descriptor referring to it is closed. Access to nvdimm memory visible as filesystem and allocated via files must be secure. No other process should be able to alter memory file and unlink is used for that together with exclusive file access. The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references. The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region. It is not possible for any Java thread to access file descriptor of memory file, because this descriptor is gone. Security gives us also good cleanup.

In case heap is mmaped from /tmp we use memory from filesystem buffers - we have swapping at application level, controlled by us, not by the OS which can hit any application. In case of zram we have transparent memory compression.

0 komentarze: