Log Cleaning – Georgia Tech – Advanced Operating Systems

Log Cleaning – Georgia Tech – Advanced Operating Systems


I mentioned log cleaning that has to be done. Now as client activities go on, log segments evolve on the disk. So for instance, if on a particular client node some blocks were written to the log segment, a log segment may fill up like this. And now it is sitting on the disk. So the blocks that are containing this log segment corresponds to write to file blocks one, two, and five. And these file blocks may belong to different files, but it is okay. So, so far as the file system is concerned, segment one is a contiguous file. It is a log segment, but it is a file. And that’s the one that is residing on the disk. On the client node, this particular block one may get overwritten due to activity on the client. So now we have a new content for that same file block one, one double prime. And once this new content has been created, this is no longer valid, and so block one is overwritten, which means we have to kill the old block that was in this segment. So there’s a new segment in which we wrote the contents of the new file block, one double prime, and we have to go back to this old segment and kill this particular copy of that block, which is a stale copy of that same block. So we don’t need that anymore. So you can that see as client activities progress we are going to create holes in the segments. Remember that these segments are persistent data structures on the disk. This segment was valid at some point of time. But once that particular block one was overwritten on the client node, this segment contains the latest and the current copy of that same file block. And therefore we nuke this particular file block and so we create hole in this log segment. And subsequently, let’s say that the client writes to other file blocks, three and four. So the log segment two contains one double prime, three prime, and four prime are the blocks that it contains. And segment one contains two prime and five prime. One prime is not relevant anymore because it has been overwritten by one double prime. Activities continue on the client box and a third segment is created. And that third segment contains two double prime. That is, this block number two is overwritten by new contents, two double prime. And when this happens the file system has to create another hole by nuking this two prime to indicate that this block is not relevant anymore, because there is a more recent version of the block in segment three. So the block two is overwritten, killing the old block. Now you see that as client activities progress in the entire distributed system, we are creating a number of these segments, log segments on the disk, and these log segments may progressively have holes in them because they have been overwritten by new contents, by activities on the client machine. And this is what log cleaning is all about. It has to do with cleaning up the disk and getting rid of all of this junk so that we don’t fill up the disk with unnecessary junk. So for example, what we want to do is recognize that we have three segments here, and the segments have holes in them. And what we are going to do is, we’re going to aggregate all the live blocks from all of these segments in to a new segment. So we’ve got five from this segment that is still alive, and from this segment we’ve got one double prime, three prime, and four prime that are still alive. And from this segment we’ve got two double prime that is still alive. So we have coalesced all of the live blocks from the existing segments into one new segment. Now once we have aggregated this into one new segment, all of the old log segments can be garbage collected and that’s what log cleaning is all about. And this is very similar if you think about it to the way we described cleaning up the diff files that are created in the DSM system, Treadmarks. And the same thing is happening except that these data structures are on the disk we are conserving the space on the disk by getting rid of all the old log segments and garbage collecting them and saving this space once we’ve aggregated all the live blocks in these segments into a new segment. So this is what log cleaning is all about and in a distributed file system, there is a lot of garbage that is being created all across the storage service. And we don’t want this to be done by a single manager. We would ideally like it to be done in a distributed manner. This is another step towards a true distributed file system by making this log cleaning activity also a distributed activity. So log cleaner’s responsibilities include the following. It has to find the utilization status of the old log segments. Then it has to pick some set of log segments to clean, and once it picks a certain number of log segments to clean, it has to read all the live blocks that it finds in these log segments that it has chosen for cleaning, write it into a new log segment. And once it has done that, it can garbage collect all of those log segments. So this is the cleaning activity that an LFS cleaner has to do and in XFS they distribute this log cleaning activity as well. Now remember that this log cleaning activity is happening concurrently with writing to files on the nodes in the distributed system. So there are lots of subtle issues involved in managing this log cleaning in parallel with new activity that may be creating new log segments, or writing to existing log segments. I encourage you to read the paper that I’ve assigned to you, the XSF paper, to get a good feel for all the subtleties that are involved in managing log cleaning concurrently with writing to the files. So in XFS they may make the clients also responsible for log cleaning. There is no separation between client and server. Any node can be a client or a server depending on what it is doing and each client, meaning a node that is generating a log segments, it is responsible for the segment utilization information for the files that they are writing. After all the activity is happening at the client end in terms of creating new files, and new file writes are manifesting as creating blocks and log segments in XFS. And so the clients are responsible for knowing the utilization of the segments that are resident at that client node. And since we have divvied up the entire space of servers into stripe groups, each stripe group is responsible for cleaning activity that is in that set of servers. And every stripe group has a leader, and the leader in that stripe group is responsible for assigning cleaning services to the members of that stripe group. Recall that the manager is responsible for the integrity of the files because it is doing metadata management. And, requests for reading and writing files are going to come to the manager. On the other hand, the log cleaning responsibility is going to the leader of a stripe group, and the manager is the one that is responsible for resolving conflicts that may arise between client updates that want to change some log segments and cleaner functions that want to garbage collect some log segments. Those conflicts are resolved by the manager. These again are subtle details which I want you to read carefully in the paper.