Saturday, January 30, 2010

Directories and File-System, RaleighFS Way!

When you talk about a new file-system to someone, one of the first question that you will hear is "It will suffer from directory unbalancing" or "How performs with larger directories".

Ext3 seems to have spread the fear of large directory, but What is a directory? and what large directories mean?

When you design a File-System you've to decide what is your target/end-user. Don't try to design your file-system for everyone, but just for one.

So, what's your focus? Be the fastest file-system for continuous 'ls' (Directory traversing)?

My idea and RaleighFS idea, is that File-System is just a "Big Hashtable" with some useful information, where you ask for an item 'x' and you'll receive it's data, and metadata. Following this idea, what is a directory? It's just a name-container that allows user to keep track of his items. At this point, what is the best way to represent Directory?

Directory is just another item in the FS, like your mp3, c source file, and all the other stuff.


When you're a high-level developer, sometimes you forget how the low-level works, in this case the File-System. The FS has blocks when you remove something from the begin/mid of a file you shift back all the rest and you rewrite the entire file to avoid this you can intelligently pack your data based on the "block-data size" of the file system. (Block data size is not the block size, maybe the file-system add its own block header to your data).

The above picture shows the RaleighFS Directory Block data structure. There's a small block header that say how much of space are free in this block and there're directory items that are simply 2-field struct, 1 byte name size and N chars for the name.
block_size = 8192 byte
block_head = 12 byte
dir_head = 2 byte
avg_dir_item_size = 1byte + 15 byte (avg name size)
(block_size - block_head - dir_head) / avg_dir_item_size = 511 entries in one block

In a few blocks you can handle large amount of files, and the remove operation are really quick. Just find your item, move back just your "block" friends and decrease the Directory Header free space. In this way you don't have to rewrite all the item but just one or few blocks of it.

Remember, in RaleighFS, directory is just a name-container that allows user to keep track of his items (ls /home/) and on disk directory is just a file there's no unbalance. Every item is looked-up in a  O(1) Time, like traditional Hashtables.

No comments:

Post a Comment