Search This Blog

28 March 2009

Opinion and Speculation: Log Structured vs Traditional Block

At least since early 2006, at the MySQL Athens meeting where I first met and listened to Jim Starkey, I have been of the firm conviction that log structured databases will be the future for disc-based storage devices. Their largely sequential write pattern ideally suits modern drives which are optimized to write whole tracks of data at a time. I even proposed such a project to Monty and Brian at that meeting. Of course, they said that I should go ahead and write one but circumstances contrived against me and I never really progressed much beyond the experimental proof of concept stage, my time having been occupied with the aborted Amira project and providing some assistance to the early Falcon project.

For that reason, when I first heard of PBXT, I was very excited. I have told many people to keep an eye on that project because although it was slower, it will catch up and then surpass traditional block storage databases such as InnoDB.
It's taken a while but ... a big THANK YOU to Paul and his team at Primebase for ensuring that I do not have to eat my words for all the talking up I have done about PBXT for the past couple of years

The game-changer in the near future is Flash storage and other solid state media. Such technologies mean that there is no seek/head settle time. Optimization strategies like clustered indexes become obsolete. However, Flash does have some overhead in writing and it is preferred to write whole flash blocks at a time. Right now, I believe that most Flash media use 64 KB blocks but as Flash media increases in size and performance, by increasing the bit-width, the effective block size of the media will increase. Log-structured storage can optimize for this because all writes are consolidated into a block and there will be little penalty for the index to be scattered across many segments because of there being no seek penalty.

After Flash... Let us imagine a future where we have some form of ultra-fast memristor based storage which supplant DRAM, Flash and disc media, then all this talk about database storage engines becomes practically moot ... just wire up the memristor memory to your 64bit CPU with it's 64bit address bus. Provide record version control, perhaps by some form of in-memory index, perhaps vaguely log-structured but no need for contiguous segments in order to scale on NUMA architectures (which now Intel is transitioning to with their new HyperTransport inspired workalike).