Monday, September 05, 2005

Linux: Swap, file systems and such ...

Just found this interesting read on KernelTrap, posted by Mr Z.

Allow me to elaborate. UNIX filesystems have a concept of "inodes" that store the body of the file, its permissions and its ownership. The inodes get linked into directories via names--aka. directory entries. The same inode can be linked into the filesystem in multiple places. (Hence the concept of a "hard link.") The filesystem keeps track of how many links an inode has, and the kernel keeps track of how many processes have opened a given inode. This concept is important, and I will come back to it.

When an executable runs, the executable's file as well as the files for all the libraries it depends on get opened. The pages for these files get mmap()'d into the process' address space as file-backed virtual memory. The memory gets marked copy-on-write, so that any changes to the mmap()'d code result in a fault, and break the file backing. In any case, the file-backed portions are backed by the contents of the inodes themselves.

Under virtual memory pressure, the kernel will have to deallocate physical pages of memory from some processes in order to allocate them to others. There are two strategies available here: Write dirty pages to swap, and discard clean pages. Clean pages are pages which have either an explicit file backing (such as program executable pages), and pages that were previously swapped, brought back in, but still have an equivalent copy in the swap partition. (This is sometimes refered to as the "swap cache," though I don't know if that designation is accurate.)

So yes, under memory pressure, some pages of an executable might get discarded and will need to be brought in later from the original executable. The grandparent wonders how that works if a user upgrades a binary while the executable runs.

Recall that there's the separation between the file's contents (the inode) and the name given to it in the file system (hard link to the inode). File descriptors are bound to inodes, not directory entries. When you "rm" a file, you remove the link between the directory and the inode. When you replace a file, say with "cp," the existing inode gets unlinked and a new inode gets linked in its place. When you "mv" a file, it gets linked in its new location, and unlinked from its old location.

The filesystem code does not reclaim the space allocated to the inode until all references to the inode drop. This includes all filesystem links and open file descriptors. Thus, when you replace a program's executable while it executes, the currently running program continues to see the old executable, even if the inode doesn't have a visible link in the filesystem. The inode will remain allocated until all of its open file descriptors get closed. Then and only then will the filesystem reclaim the storage associated with the inode.

In fact, it is this property of UNIX derived filesystems that leads to all the orphaned inodes you find in "lost+found/" after a fsck if your system gets shut down abruptly. Any inodes that were open at the point of the crash, but which did not have a hard directory link end up here.

In my experience, the NTFS file system is not as advanced as explained in the above. If you develop in M$ Visual Studio and mess a bit with the copy local setting, you'll see referenced files get locked and there's no way to overwrite them. You can somehow simulate the Unix/Linux behaviour by renaming the referenced file. If for example c:\test\a.dll is locked, just rename it to c:\test\b.dll and copy the new version of a.dll to c:\test

1 comment:

Anonymous said...

How do you replace a file using cp command