Day 3 of my database exploration mission brings me to MonetDB. Binary downloads are available for Debian, Fedora, Ubuntu and (strangely!) Windows! If we still had any Windows users left here at HQ then it’d be a rare treat, but instead (as usual) our platform of choice (Centos 5) isn’t directly available in binary form. We downloaded the Fedora source RPMs and built our own - in case they’re of any use then I’ve put them up on a Google Code site for others to download.
After installing the RPMs then you’re ready to get started - before you can do anything you have to start the
merovingian process (you could either setup an init script, or run the binary manually for now). For information, the instructions say:
merovingian is a daemon process that controls a collection of database servers, i.e. mserver5 processes, each looking after a single physical database. Start this program to gain access to your MonetDB database farm. merovingian is designed to be used in a system initialisation script in production environments.
With merovingian running then you’re ready to create a database - for this you use
monetdb - and then start the database using the same command for example:
> monetdb create twf successfully created database 'twf' > monetdb status name state twf stopped > monetdb start twf starting database 'twf'... done > monetdb status name state twf running
You now have a running database and can connect to it using
mclient. This is similar to most command line clients where you can perform changes to your database as well as query for data.
The first step to transferring the database was as usual - inspect the schema on our MySQL database and update it to make the correct use of the supported data types. As with the other systems, there’s no support for unsigned values, it also wasn’t immediately obvious to me what the maximum length of a varchar is.
With the tables created it was time to try and migrate some data. Given MonetDB has been around for quite a while then there seemed to be pretty scarce resources with any detailed instructions - I couldn’t, for example, find any simple migration tools or documentation detailing the best path for migration. I guess this could be because MonetDB is more often tackled by people with bigger brains or with more time to figure things out.
I attempted to use the following to dump data from MySQL:
SELECT * FROM h INTO OUTFILE '/dbtmp/tmp/h' FIELDS TERMINATED BY "|" ENCLOSED BY '"';
And then the following to import into my MonetDB table:
COPY 1000000 RECORDS INTO h FROM '/dbtmp/tmp/h' USING DELIMITERS '|','\n', '"' NULL AS '';
This yielded reasonable results - though I did have to do some tidying up in the middle with
sed - in the end I gave up as there were some string values causing me problems, so I decided to rest on it and went to bed!
In the morning I came back to find the merovingian process was dead, and the status of the database was showing as crashed. I started up the processes and took a look at the status - it said the health was 67% so I’m not really sure what’s going on with it!
In the time I had available I was only able to get a 1 million row table imported successfully to play with - a shocking performance I know, but MonetDB was being quite fussy and I wasn’t pressing the right buttons! I did run a few tests and also ran them against the same dataset in MySQL for comparison, all are run from cold - i.e. MySQL and MonetDB are both restarted before each query. I don’t expect these queries to be representative of real world cases, I was just thinking of some nasty queries that I could throw at a single table in order to cause some pain.
sql>SELECT COUNT(*) FROM h; +---------+ | L1 | +=========+ | 1000000 | +---------+ 1 tuple Timer 1.532 msec 1 rows
MySQL takes 250msec for the same query.
sql>SELECT COUNT(*) FROM h GROUP BY intcolumn; +-------+ | L2 | +=======+ +-------+ 65 tuples Timer 142.260 msec 65 rows
In comparison, MySQL takes 420msec:
sql>SELECT COUNT(*) FROM h GROUP BY varcharcolumn; +-------+ | L1 | +=======+ +-------+ 12743 tuples Timer 1464.389 msec 12743 rows
For comparison, MySQL takes 44,000msec.
sql>SELECT COUNT(*) AS TOTAL FROM h GROUP BY varcharcolumn ORDER BY total; +-------+ | L1 | +=======+ +-------+ 12743 tuples Timer 1496.537 msec 12743 rows
For comparison, MySQL takes 37,500msec.
sql>SELECT COUNT(*) AS TOTAL FROM h GROUP BY varcharcolumn, anothervarcharcolumn ORDER BY total; +-------+ | L1 | +=======+ +-------+ 69696 tuples Timer 4170.520 msec 69696 rows
For comparison, MySQL takes 373,000msec (not a typo, it’s more than 6 minutes).
Obviously this quick trial of each of these is not comprehensive enough to make any solid comparisons of performance - the next step will be for me to go through and come up with a proper test plan in order to be a little more methodical about things.
However, it has given me a good grounding in how the 3 systems compare with respect to installing and getting started.
I’ll be keeping a close eye on InfiniDB - while not stable enough right now, I’m sure they’ll keep things rolling and I look forward to taking another look.
If I can overcome the import obstacles and also the different ‘feel’ of MonetDB then the basic query results make a compelling case for taking a further look - there’s also more to learn here with respect to architecture, deployment techniques, monitoring, etc.
Finally, Infobright - it would make my life easier if we could use it on an insert/update/delete basis - as it is I think we’d have a tough time getting clients to pay the license fee - perhaps if bundled with something like EC2 instances with a smaller incremental cost then it may be more palatable and help to increase adoption (it may be that Infobright have lots of customers with open wallets - in which case please share them!).
In terms of immediate ease of use, with some visible performance improvements, Infobright fits the bill - but until I’ve had a chance to compare MonetDB and Infobright in a bit more detail then I’ll reserve my final judgement!