We’re taking a whistle-stop tour of some of the column based storage systems out there for a project we’re working on (where the use case seems to fit better with this form of storage rather than straight MySQL). After reading through the series of articles on the MySQL Performance Blog then we chose to look at InfiniDB, Infobright and MonetDB - with the two that talk MySQL coming first for ease of integration right now. I’m also going to do this as a three parter - so first up is InfiniDB.
InfiniDB was recently released as Open Source by Calpont. The science bit says:
InfiniDB Community Edition provides a scale-up analytics database engine for your data warehousing, business intelligence and read-intensive application needs. Enabled via MySQL and purpose-built for an analytical workload with column-oriented technology at its core, the multi-threaded capabilities of InfiniDB Community Edition fully encompass query, transactional support and bulk load operations.
Handily for us, there are RHEL 5 RPMS available to download so we were able to suck them down and install them - for this trial I used version 0.9.5.1. Following the instructions (we believe fully!) yielded a system that wouldn’t start (and yes, we had disabled the existing MySQL install to ensure no port conflict!). Wiping everything and reinstalling then presented a working system.
There are some limitations to the data types supported - there is no support for unsigned numbers, and varchars are limited to 255. I expect this is a design decision as in a database structured around using InfiniDB then the strings would be stored in a MyISAM table and referenced.
A nice feature is support for DML operations - so you can do insert, update and delete.
After successfully exporting a table from MySQL and importing it into InfiniDB using their colxml and cpimport tools we had a table of data and were able to run some simple queries,
select count(*) for example. However, after a tea break the queries were no longer running and giving an error:
ERROR 122 (HY000): There was an internal error encountered in the Calpont Engine while processing this query. The query was cancelled. You may resubmit it if you like. The error is lost Connection to ExeMgr. If the problem continues, please contact your system administrator.
Not having the time or patience to dig deeper, I blitzed it once more and did another clean install. I then went to create the same table as earlier and boom! I saw the load climb up and then typically my wireless dropped out, when it came back I was unable to connect back in and our monitors were showing load sitting at 10. We left it 10 minutes, then rebooted. After a reboot, creating the same table took 20 seconds.
We then successfully imported 4 tables of data and proceeded to start our list of queries to test - the first one caused load to rise to 10 again and a reboot was required. This one has been raised as a bug!
I decided to leave it there for the day. The InfiniDB system looks promising on paper, and has been written by people infinitely more intelligent than I am, but right now it’s not stable enough for me to want to go any further. It is only in Alpha right now so bugs are to be expected - I’ll definitely be revisiting it when it’s a bit more mature or when I’ve got time to go through and explore the bugs properly so I can contribute back.
Next up is Infobright and MonetDB - I’m tackling Infobright today so will see how things go!
Thank you for taking the time to download and try the InfiniDB software. You correctly mentioned in your blog post that the software is currently in an Alpha state, one that we hope the Community, like you, will help us advance quickly. We depend on this type of feedback, so we’re grateful for learning about your experience so far.
You’ll be happy to know that we are currently working on extending support for VARCHAR from 255 to 8000, so hopefully this will help with whatever string data you are attempting to store. That should be available to the Community here within the next few weeks. Also, I want to thank you for using the Launchpad bug tracking system to enter the above bug. We’ve been able to duplicate your issue and that has been assigned to a developer. This fix should also be available shortly.
Further on the issue that requires a reboot, it’s possible that there is also a runtime setting that needs to be adjusted to match the amount of memory available on the server that could be contributing to your experience. For table join operations, we have a setting within the calpont.xml configuration file called UmMaxMemorySmallSide. If it is unchanged on your system, it is likely currently set to 4G. If you have 4GB or less of memory available on the machine that you are running on, there is a chance that you could be running out of system memory for a given operation and thus ending up in the state mentioned above. A safe setting would be in the neighborhood of 25% of available memory on the system, so for a server with 4GB, try setting this value to 1G. In addition, there is also a NumBlocksPct setting that potentially will need adjustment as well to keep in balance with the UmMaxMemorySmallSide setting. Both of these settings and more are covered in our free Performance and Tuning Guide, available on infinidb.org for registered users. If you have not taken the time to do so, I would strongly recommend taking a moment to register and download that document…I suspect that it will prove to be a useful asset for you.
As you continue your evaluation, please continue to use Launchpad or the InfiniDB Forums to log your issues. We’ll get on them just as quickly as we can once they’re reported.
We certainly want you and everyone out there to have a positive experience with the software so we’re here to help. I would be happy to get on the phone with you or would also be happy to work through your issues on this blog, on the community forums or on Launchpad. Just let us know what works for you.
I thought I would just take a moment to catch you up on our progress at Calpont with InfiniDB. Since my last post, we have moved the Community Edition forward to a “final” status and, as of last week, we have launched the commercial Enterprise Edition at http://calpont.com. The Enterprise Edition has everything the Community Edition has, but then adds scale-out MPP processing allowing queries to not only be parallelized across cores like the Community Edition, but across servers as well.
A number of other folks have been doing comparison tests of the open source columnar technology similar to this effort. Here’s a link to some testing that Vadim at Percona has completed.
Let me know if there is anything we can do to answer questions for you.