Sunday, February 26, 2012

Thoughts on bucket flushing, block size and compression...

This post is just a collection of thoughts, on how to store data. I hope to get some feedback and new ideas from you guys, thanks!

Since I've started working on B*Trees, I've always assumed to have a fixed block size. With this "restriction" & "simplification", you can easily come up with a block in this format:
Keys are packed together in the beginning of the block and values are packed together in the end of the block, growing up toward the center. In this way you don't have to move data when a new key is added or keys & values when one value grows.

Assuming that your inserts in the block are already sorted (e.g. flush "memstore" to disk), in this way you can even compress keys & values with different algorithms.

...but, with the fixed size block and values starting from the end you need to allocate a full block.

In contrast, you can "stream" your data and flush it, just few kb of data at the time. In this case you don't have to allocate a full block, but you've lost the ability to use different compressions for keys & values and the ability to keep in memory only the keys without doing memcpys.

Another possibility is traverse the data twice, the first time writing the keys and the second time writing the values. In this case you gained the different compression features but if you're using compression you're not able to stop after a specified block size threshold because you don't know how much space each key & value takes.

Any comments, thoughts, ideas?

No comments:

Post a Comment