Context for Global Memory System – Georgia Tech – Advanced Operating Systems

Typically, the virtual address space of a process running on a processor, let’s say your desktop, your PC, your laptop, and so on, is much larger than the physical memory that is allocated for a particular process. So the role of the virtual memory manager in the operating system is to give the illusion to the process that all of its virtual address space is contained in physical memory. But, in fact, only a portion of the virtual address space is really in the physical memory. And that’s called the working set of the process. And the virtual memory manager supports this illusion for the process by paging in and out from the disk the pages that are being accessed by a process at any particular point of time so that the process is happy in terns of having its working set contained in the physical memory. Now when a node, a desktop, is connected on a local area network to other peer nodes that are also on the same local area network it is conceivable that at any point of time, the memory pressure, meaning the amount of memory that is required to keep all the processes running on this node happy, may be different from the memory pressure on another nodes. In other words this particular node may have a much higher memory pressure because the work load on this load consumes a lot of the physical memory whereas this work station may be idle and therefore all of this memory is not being utilized at this point for running anything useful because no applications are running on these nodes. So this opens up a new line of thought. Given that all these nodes are connected on a local area network and some nodes may be busy while other nodes may be idle. Is it possible if a particular node experiences memory pressure, can we use the idle cluster memory, and in particular, can we use the cluster memory that’s available for paging in and out the working set of the processes on this node? Rather than going to the disk, can we page in and out to the cluster memories that are idle at this point of time? It turns out that with advances in local area networking gear, it’s already made possible for gigabit ethernet connectivity to be available for desktops. And pretty soon, 10 gigabit links are going to be common in connecting desktops to the local area network. This makes sending a page to a remote memory or fetching a page from a remote memory faster than sending it to a local disk. Typically, the local disk access speeds are in the order of 200 megabytes per second in terms of transfer rate, but on top of that, you have to add things like seek latency and rotation latency for accessing the data that you want from the disk. So all of this augers well for saying that perhaps paging in and out through the local area network to peer memories that are idle may be a good solution when we have memory pressure being experienced at any particular node in the cluster. So the global memory system, or GMS for short, uses cluster memory for paging across the network. So in other words in normal memory management, if virtual address to physical address translation fails then the memory manager knows that it can find this virtual address on the disk, meaning the page that contains this virtual address on the disk, it pages in that particular page from the disk. Now in GMS, what we’re going to do is, if there is a page fault, that is, this translation, virtual address to physical address, fails then that means that the page is not in the physical memory of this node. In that case GMS is going to say, well it might be there in the cluster memory, in one of the nodes of my peers, or it could be on the disk if it is not in the cluster memory of my peers. So that’s the idea that GMS is sort of integrating the cluster memory into this memory hierarchy. Normally, when we think about memory hierarchy in a computer system, we say there’s a processor, there’s the caches, and there’s the main memory, and then there is the virtual memory sitting on the disk. But now in GMS, we’re sort of extending that by saying in addition to the normal memory hierarchy that exists in any computer system, that is processor, caches and memory, there is also the cluster memory. And only if it is not in the cluster memory, we have to think of going to the disk in order to get the page that we’re looking for. That’s sort of the idea. So in other words, GMS trades network communication for disk I/O. And we are doing this only for reads, for reading across a network. GMS does not get in the way of writing to the disk, that is the disk always has copy of all the pages. The only pages that can be in the cluster memories are pages that have been paged out that are not dirty. And if there are dirty copies of pages, they are to be written onto the disk just like it will happen in a regular computer system. In other words, GMS does not add any new causes for worrying about failures because even if a node crashes all that you lose is clean copies of pages belonging to a particular process that may have been in the memory of this node. But those pages are on the disk as well. So the disk always has all the copies of the pages. It is just that the remote memories of the cluster is serving as yet another level in the memory hierarchy. So just to recap the top level idea of GMS. Normally, in a computer system if a process page faults that means that the page it is looking for is not in physical memory, it’ll go to the disk to get it. But in GMS, what we’re going to do is if a process page faults, we know that it is not in physical memory but it could be in one of the peer memories as well. So GMS is a way of locating the faulting page from one of the peer memories, if it is in fact contained in one of the peer memories. If not, it’s going to be found on the disk. So that’s the idea behind that. So when the virtual memory manager, the virtual memory manager that is part of GMS, decides to evict a page from physical memory to make room for the current working set of processes running on this node, then what the virtual memory manager is going to do is, instead of swapping it out to the disk, it is going to go out on the network and find a peer memory that’s idle, and put that page in the peer memory so that later on if that page is needed, it can fetch it back from the peer and memory. That’s sort of the big picture of how the global memory system works.