Ooh la la: Paperclip et les European S3 buckets

At the end of my last blog about Paperclip I mentioned that you need to do some patching if you want to use European S3 buckets to store your files. The problem was introduced when Paperclip made the move from RightAWS to Marcel Molina’s AWS::S3 gem. Unfortunately despite several forks containing patches to AWS::S3 and a 4 month old bug report nothing has been done to officially fix the problem.

So my fellow Europeans, what are we to do?

Read more of this entry

Conditional duplicate key updates with MySQL

In one of our larger Rails apps the sheer volume of data we process means we’ve had to rely more and more on direct SQL queries, denormalised tables and summary tables to speed things up. When updating summary tables we typically use ON DUPLICATE KEY UPDATE, a MySQL extension to INSERT statements since version 4.1, that allows a record to either be inserted or updated in one query.

Read more of this entry

InfiniDB, Infobright and MonetDB - Day 3: MonetDB

Day 3 of my database exploration mission brings me to MonetDB. Binary downloads are available for Debian, Fedora, Ubuntu and (strangely!) Windows! If we still had any Windows users left here at HQ then it'd be a rare treat, but instead (as usual) our platform of choice (Centos 5) isn't directly available in binary form. We downloaded the Fedora source RPMs and built our own - in case they're of any use then i've put them up on a Google Code site for others to download.

After installing the RPMs then you're ready to get started - before you can do anything you have to start the merovingian process (you could either setup an init script, or run the binary manually for now). For information, the instructions say:

merovingian is a daemon process that controls a collection of database servers, i.e. mserver5 processes, each looking after a single physical database. Start this program to gain access to your MonetDB database farm. merovingian is designed to be used in a system initialisation script in production environments.

With merovingian running then you're ready to create a database - for this you use monetdb - and then start the database using the same command for example:

> monetdb create twf
successfully created database 'twf'

> monetdb status
     name        state     
twf            stopped              
  
> monetdb start twf
starting database 'twf'... done

> monetdb status
     name        state     
twf            running

You now have a running database and can connect to it using mclient. This is similar to most command line clients where you can perform changes to your database as well as query for data.

The first step to transferring the database was as usual - inspect the schema on our MySQL database and update it to make the correct use of the supported data types. As with the other systems, there's no support for unsigned values, it also wasn't immediately obvious to me what the maximum length of a varchar is.

With the tables created it was time to try and migrate some data. Given MonetDB has been around for quite a while then there seemed to be pretty scarce resources with any detailed instructions - I couldn't, for example, find any simple migration tools or documentation detailing the best path for migration. I guess this could be because MonetDB is more often tackled by people with bigger brains or with more time to figure things out.

I attempted to use the following to dump data from MySQL:

select * from h into outfile '/dbtmp/tmp/h' fields terminated by "|" enclosed by '"';

And then the following to import into my MonetDB table:

copy 1000000 records into h from '/dbtmp/tmp/h' using delimiters '|','\n', '"'  null as '';

This yielded reasonable results - though I did have to do some tidying up in the middle with sed - in the end I gave up as there were some string values causing me problems, so I decided to rest on it and went to bed!

In the morning I came back to find the merovingian process was dead, and the status of the database was showing as crashed. I started up the processes and took a look at the status - it said the health was 67% so i'm not really sure what's going on with it!

Performance

In the time I had available I was only able to get a 1 million row table imported successfully to play with - a shocking performance I know, but MonetDB was being quite fussy and I wasn't pressing the right buttons! I did run a few tests and also ran them against the same dataset in MySQL for comparison, all are run from cold - i.e. MySQL and MonetDB are both restarted before each query. I don't expect these queries to be representative of real world cases, I was just thinking of some nasty queries that I could throw at a single table in order to cause some pain.

Query 1

MySQL takes 250msec:

sql>select count(*) from h;
+---------+
| L1      |
+=========+
| 1000000 |
+---------+
1 tuple
Timer       1.532 msec 1 rows
Query 2

MySQL takes 420msec:

sql>select count(*) from h group by intcolumn;
+-------+
| L2    |
+=======+
+-------+
65 tuples
Timer     142.260 msec 65 rows
Query 3

MySQL takes 44,000msec:

sql>select count(*) from h group by varcharcolumn;
+-------+
| L1    |
+=======+
+-------+
12743 tuples
Timer    1464.389 msec 12743 rows
Query 4

MySQL takes 37,500msec:

sql>select count(*) as total from h group by varcharcolumn order by total;
+-------+
| L1    |
+=======+
+-------+
12743 tuples
Timer    1496.537 msec 12743 rows
Query 5

MySQL takes 373,000msec (not a typo, it's more than 6 minutes):

sql>select count(*) as total from h group by varcharcolumn, anothervarcharcolumn order by total;
+-------+
| L1    |
+=======+
+-------+
69696 tuples
Timer    4170.520 msec 69696 rows

Summary

Obviously this quick trial of each of these is not comprehensive enough to make any solid comparisons of performance - the next step will be for me to go through and come up with a proper test plan in order to be a little more methodical about things. However, it has given me a good grounding in how the 3 systems compare with respect to installing and getting started. I'll be keeping a close eye on InfiniDB - while not stable enough right now, i'm sure they'll keep things rolling and I look forward to taking another look. If I can overcome the import obstacles and also the different 'feel' of MonetDB then the basic query results make a compelling case for taking a further look - there's also more to learn here with respect to architecture, deployment techniques, monitoring, etc. Finally, Infobright - it would make my life easier if we could use it on an insert/update/delete basis - as it is I think we'd have a tough time getting clients to pay the license fee - perhaps if bundled with something like EC2 instances with a smaller incremental cost then it may be more palletable and help to increase adoption (it may be that Infobright have lots of customers with open wallets - in which case please share them!). In terms of immediate ease of use, with some visible performance improvements, Infobright fits the bill - but until i've had a chance to compare MonetDB and Infobright in a bit more detail then i'll reserve my final judgement!

Discuss this entry

InfiniDB, Infobright and MonetDB - Day 2: Infobright

Day 2 of my tour of column based storage brings me on to Infobright Community Edition (ICE). The first impressive point was that based on my blog post of yesterday then I already had an email from Mark in Community Relations at Infobright offering help and advice - despite me calling him the wrong name (I was having a bad day!) then he was immediately helpful and also offered to get some of his team to look into my queries.

As an aside, John from Calpont was also kind enough to drop by to respond to some of my points - to me this gives me a warm fuzzy feeling that both Infobright and Calpont are taking the community seriously - I guess for these products to gain traction they need to make sure people can get motoring with them to improve adoption.

Read more of this entry

InfiniDB, Infobright and MonetDB - Day 1: InfiniDB

We're taking a whistlestop tour of some of the column based storage systems out there for a project we're working on (where the use case seems to fit better with this form of storage rather than straight MySQL). After reading through the series of articles on the MySQL Performance Blog then we chose to look at InfiniDB, Infobright and MonetDB - with the two that talk MySQL coming first for ease of integration right now. I'm also going to do this as a three parter - so first up is InfiniDB.

Read more of this entry

I need your designer glasses, your blue jeans and your black turtle-neck sweater

Picture of a new MacBook ProOk so it’s not quite Schwarzenegger but last week I terminated a twenty year relationship with Microsoft and bought a Mac. Now I just need to get hold of the Apple uniform and I’ll officially be part of the club!

After more than a year of pontification on what exactly to buy as a replacement for my ageing Dell Inspiron laptop, I finally settled on a shiny new 13” MacBook Pro and an even shinier 24” Cinema Display. So far I’m pretty chuffed with my choice.

Read more of this entry

Protecting your Paperclip downloads

Way back last November when I first blogged about Paperclip I included a brief mention of hiding files behind a controller rather than simply putting them in the public directory for all to see. Since then I’ve noticed that the question of how to actually do this has come up regularly over on Rails Forum and a couple of weeks ago I had to figure out how to update some of our code to protect assets that we had migrated from local file system to Amazon S3 storage. So I figured it’s probably a worthwhile technique to share.

Read more of this entry

Bugmash!

Picture of a masher mashing a bugWell it’s day two of the first ever Rails BugMash and so far I’ve managed to score a sneaky 1,000 points just by updating my one-line binary fixtures test patch. Meanwhile Matt Duncan and Rizwan Reza are on fire with 4,350 and 4,000 points at the time of writing.

Unfortunately the event has coincided with what may be the only nice weekend of the Great British summer, so I’ve been torn between the chance to mash bugs or to enjoy the sunshine. I’m currently trying to combine the two sat out in the garden squinting to see my laptop screen in the glare of the sun!

My next attempt to score some points is an updated patch, now improved and including a test case, for a lack of quoting of aliased table names in SQL joins which has been (too eagerly) marked as resolved even though it’s still broken. If you get the opportunity please do take a look and comment on the ticket as it’d be nice to get it fixed.

After that, I’m hoping to try and sneak my patch for anonymous extension modules for belongs_to and has_one associations into the bugmash as it has been sat on Lighthouse since March and already has three +1s. Even if it isn’t eligible for the bugmash, I still think it’s a worthy patch so again please take a look and comment on the ticket if you get chance.

And of course there are still plenty more tickets tagged with bugmash to be looked at so even if you’ve never contributed to Rails before, now is a pretty good time to start!

Discuss this entry

Thin, Rails 1.2.3/1.2.6 and ActionController::Dispatcher (NameError)

Please note: This patch has now been applied to the Thin master repository so will be fixed in all future releases.

Whilst trying to get an old Rails app up and running with Thin (Gem version 1.2.2) then I encountered a spot of bother:

load_missing_constant: uninitialized constant ActionController::Dispatcher (NameError)

Read more of this entry

Facebook Style Multi Select

Facebook Style Multi SelectWe've got a couple of projects on the go right now where a Facebook style multi-select (with or without autocomplete) is a good fit for making the user interface more, err, useable.

As ever, searching around there are a number of options - especially given the number of Javascript frameworks around. For us, things are also complicated by the fact that not all our projects use the same Javascript libraries - we're not always responsible for the front-end code and as such some clients already have a dependency on, or a preference for a given library.

Read more of this entry

Amazon CloudWatch, Auto Scaling and Elastic Load Balancing

Amazon have been busy again and today announced the release of three new services in their Web Services portfolio: monitoring, scaling and load balancing.

You can read the details in the blog post on the Amazon Web Services Blog.

Read more of this entry

Putting our Running Shoes on for Charity

In a change from our usual bums on seats position, we've decided to take part in a couple of running events this summer.

Read more of this entry

Using SQLite3 and MySQL on Windows

Way back in 2007, Rails 2.0.2 changed the default database from MySQL to SQLite3, making it easier for newcomers to get up and running with Rails quickly without having to worry about setting up a database. Unfortunately for Windows users, trying to install the SQLite3 gem with gem install sqlite3-ruby will result in the following error:

Building native extensions.  This could take a while…
ERROR:  Error installing sqlite3-ruby:
        ERROR: Failed to build gem native extension.

Read more of this entry

Accessing column information with ActiveRecord

A quick tip - ActiveRecord provides a number of methods that allow you to access information about the underlying table columns for a model.

Read more of this entry

Prototype, emptyFunction and callbacks

Let’s say we have a JavaScript function that looks like this:

function plot(options) {
  options = options || {};
  this.onUpdate = options.onUpdate;

  //
  // more code to do something interesting
  //
}

Read more of this entry

Archives

  1. January 2010
  2. November 2009
  3. September 2009
  4. August 2009
  5. July 2009
  6. June 2009
  7. May 2009
  8. April 2009
  9. March 2009
  10. February 2009
  11. December 2008
  12. November 2008
  13. September 2008
  14. August 2008
  15. July 2008
  16. June 2008
  17. May 2008
  18. April 2008
  19. March 2008
  20. February 2008
  21. January 2008
  22. December 2007
  23. November 2007

Tags

  1. actioncontrollerdispatcher (nameerror)
  2. actionview
  3. active messaging
  4. activerecord
  5. activesupport
  6. actverecord
  7. aes
  8. aggregation
  9. ajax
  10. akismet
  11. amazon
  12. amazon sqs
  13. ami
  14. apache
  15. api
  16. apple
  17. apr
  18. apr-util
  19. async
  20. attachments
  21. attachment_fu
  22. attr_accessible
  23. auto scaling
  24. autotest
  25. availability
  26. aws
  27. backgroundrb
  28. beanstalkd
  29. bindings
  30. bj
  31. block
  32. branding
  33. buckets
  34. bug
  35. bugmash
  36. cache
  37. caching
  38. callbacks
  39. cancer research uk
  40. cdn
  41. centos
  42. charity
  43. cloud
  44. cloudfront
  45. clusters
  46. column information
  47. columns
  48. community
  49. company name
  50. compatibility
  51. compiler
  52. composed_of
  53. consultancy
  54. content
  55. content delivery
  56. controller
  57. convert
  58. cookies
  59. csrf
  60. css
  61. data warehouse
  62. database
  63. dates
  64. defensio
  65. deployment
  66. design
  67. development
  68. dhtml
  69. docrails
  70. documentation
  71. donations
  72. drdb
  73. duplicate key
  74. ebs
  75. ec2
  76. elastic
  77. elastic block store
  78. elastic load balancing
  79. encoding
  80. encryption
  81. erb
  82. error
  83. european
  84. events
  85. ezcrypto
  86. facebook
  87. fckeditor
  88. feedburner
  89. feeds
  90. ffmpeg
  91. filter
  92. fixes
  93. flash
  94. flickr
  95. flickr api
  96. flickr_fu
  97. fuse
  98. geekup
  99. gems
  100. geocode
  101. git
  102. github
  103. god
  104. great south run
  105. greenplum
  106. growl
  107. hacker
  108. haml
  109. haproxy
  110. helper
  111. hmac
  112. holiday
  113. hooks
  114. hosting
  115. howto
  116. hpricot
  117. html
  118. identity
  119. imagemagick
  120. imagescience
  121. infinidb
  122. infiniteftp
  123. infobright
  124. init.d
  125. insert
  126. invalid authenticity token
  127. italy
  128. javascript
  129. jquery
  130. json
  131. leeds media
  132. limit
  133. linux
  134. load balancing
  135. logo
  136. mac
  137. markaby
  138. mass-assignment
  139. memcached
  140. mephisto
  141. messaging
  142. middleware
  143. migrate
  144. migration
  145. model
  146. mod_rails
  147. mod_ruby
  148. monetdb
  149. mongrel
  150. mongrel_cluster
  151. monit
  152. monitoring
  153. mootools
  154. mp3
  155. mq
  156. multiple gems
  157. multiselect
  158. mysql
  159. neon
  160. new site
  161. nginx
  162. observer
  163. offset
  164. open source
  165. opensolaris
  166. openssl
  167. optimisation
  168. paperclip
  169. parameters
  170. passenger
  171. patch
  172. performance
  173. permanentredirect
  174. persistence
  175. persistent storage
  176. persistentfs
  177. php
  178. phusion
  179. plugin
  180. plugins
  181. post commit
  182. post-commit
  183. protomultiselect
  184. prototype
  185. query
  186. queues
  187. race for life
  188. rack
  189. rails
  190. rails development
  191. rails patch
  192. rails plugin
  193. rails-doc
  194. rake
  195. refresh
  196. renderer
  197. respond_to
  198. rich text editor
  199. rmagick
  200. ruby
  201. ruby on rails
  202. rubyinline
  203. running
  204. rvideo
  205. s3
  206. s3fs. elasticdrive
  207. scaling
  208. schema
  209. schwarzenegger
  210. scm
  211. security
  212. services
  213. session
  214. shorthand
  215. snarl
  216. social
  217. solaris
  218. spam filter
  219. sparrow
  220. specify
  221. sponsorship
  222. sql
  223. sqlite3
  224. sql_logging
  225. starling
  226. starter kit
  227. storage
  228. streaming
  229. subversion
  230. sue ryder care
  231. survey
  232. svn
  233. swfupload
  234. swig
  235. sysadmin
  236. tables
  237. tamper
  238. templates
  239. the webfellas
  240. thewebfellas
  241. thin
  242. thumbnail
  243. time zone
  244. tinymce
  245. tip
  246. tips
  247. to-done
  248. training
  249. transcoding
  250. twitter
  251. tzinfo
  252. uk
  253. uk rails
  254. unsigned
  255. update
  256. uploads
  257. url
  258. validation
  259. version
  260. video
  261. view
  262. vmdk
  263. vmware
  264. webfellas
  265. webfellows
  266. wedding
  267. welcome
  268. widgeditor
  269. will_paginate
  270. win32
  271. windows
  272. wysiwyg
  273. xen
  274. xhtml
  275. xvm
  276. youtube
  277. zenoss
  278. zentest
  279. zfs

Projects

Flickr snaps