Monday, October 13, 2008

Contig and World of Warcraft

OKay, as promised, I am continuing yesterday's explanation. Read the previous blog post if this seems confusing to you.

Basically, file fragmentation happens when you patch the WoW client because while it's making changes to the giant archive files it can't exactly force the OS to write to the exact same places on the disk as before. So, instead of leaving the file exactly where it is on the disk, bits and pieces of it get moved around to a presumably reasonable-sized empty spot on the disk. In practice this works well enough, but in case you haven't noticed the patch for tomorrow amounts to over a gigabyte of changes in total. It's going to result in a lot of new fragments being made.

A bit more about contiguous reads and disk cache first. Your hard drive (unless you bought something really cheap) has a bunch of actual RAM on it which the disk uses so that it can operate asynchronously from the operating system and hopefully be more efficient. When your OS calls for the information stored in sector 9,242, the drive will go and read that sector and then start filling it's cache with what's in several of the sectors immediately following that in hopes that they're going to be read in next. If the OS is reading one file that's contiguous on the disk, this works out great because the computer might be busy doing something else the moment it gets that sector from the drive and might not be ready to ask for the next sector just yet but when it does, the drive will have that ready so it'll zip right across the drive cable from the RAM in the drive (which is much faster than the disk platters themselves) to the computer. Think of it like two guys working to move everything from one box into another. If there's no cache, the guy with the empty box says "give me something" and the guy with the full box reaches down, grabs something, and hands it to him, and then he stands there waiting while the empty-box guy puts the thing in the empty box and then turns to him and says "OKay, give me something else". The key thing there is that the full-box guy stands there waiting. To represent the function of a disk cache, we'll add a small table to this scenario and place it right between them. Instead of one guy handing something directly to the other, they'll put it on the table where the other one will pick it up. This changes the scenario and allows the two guys to work asynchronously of each other. Now the empty box guy asks for something, and the full box guy can start pulling things out of his box and putting them on the table as fast as he can until the table is full. Now the empty box guy can just grab the stuff off the table and take as much time as he needs to put them into his box. This of course becomes even more efficient if there's two guys with empty boxes (or in the case of your computer, two simultaneous read operations going on).

The thing that makes non-contiguous reads slow is that if the full box guy can't make assumptions about what the empty box guys want next (or if those assumptions are wrong because the file isn't neatly arranged on the platters) he'll spend much more time stooping over and then straightening up again getting one thing at a time when he could be using both hands to just put things on the table one after the other. That stoop-and-straighten can be (unless you have a totally silent drive) as a tiny clicking sound every time the drive's heads have to move to another track on the disk. When there's lots of searching for new places on the disk going on, you hear more than just a tiny click, you hear a sound like teeny-weeny machine gun fire as the disk heads go all over to different places on the disk. Suffice to say, this noise means the drive is reading much more slowly than it could be, and you don't want to spend any more time than you have to staring at that progress bar.

With contig around to clean the filesystem up a bit, you can make sure that if the computer is just trying to read a single big file (don't worry too much about what other things you have going in the background might be doing, because this "degrades gracefully" unless the OS sucks even more than Windows does) that it can do so as quickly and efficiently as possible.

No comments: