Rails 2.3.8, Rack 1.1 and the curious case of the missing quotes

If you're using Rails 2.3.8 for your application and thought that you were safe after May's comedy of errors produced three point updates in as many days, think again. Unfortunately there's a little bug that can lead to parameters being altered or potentially even truncated without warning.

Read more of this entry

Create a bootable EBS AMI from a running instance

A quick set of notes on how to create a bootable EBS snapshot from a running EC2 instance - for example, an instance that has been started from an S3 backed AMI.

We've had to do this a few times over the last few months - for the benefit of others, we've outlined how we currently do it - this is based on a number of articles that were surfacing at the time of our research, but I don't have the links to hand. If there's a better way out there feel free to jump in!

Read more of this entry

Ooh la la: Paperclip et les European S3 buckets

At the end of my last blog about Paperclip I mentioned that you need to do some patching if you want to use European S3 buckets to store your files. The problem was introduced when Paperclip made the move from RightAWS to Marcel Molina’s AWS::S3 gem. Unfortunately despite several forks containing patches to AWS::S3 and a 4 month old bug report nothing has been done to officially fix the problem.

So my fellow Europeans, what are we to do?

Read more of this entry

Conditional duplicate key updates with MySQL

In one of our larger Rails apps the sheer volume of data we process means we’ve had to rely more and more on direct SQL queries, denormalised tables and summary tables to speed things up. When updating summary tables we typically use ON DUPLICATE KEY UPDATE, a MySQL extension to INSERT statements since version 4.1, that allows a record to either be inserted or updated in one query.

Read more of this entry

InfiniDB, Infobright and MonetDB - Day 3: MonetDB

Day 3 of my database exploration mission brings me to MonetDB. Binary downloads are available for Debian, Fedora, Ubuntu and (strangely!) Windows! If we still had any Windows users left here at HQ then it'd be a rare treat, but instead (as usual) our platform of choice (Centos 5) isn't directly available in binary form. We downloaded the Fedora source RPMs and built our own - in case they're of any use then i've put them up on a Google Code site for others to download.

After installing the RPMs then you're ready to get started - before you can do anything you have to start the merovingian process (you could either setup an init script, or run the binary manually for now). For information, the instructions say:

merovingian is a daemon process that controls a collection of database servers, i.e. mserver5 processes, each looking after a single physical database. Start this program to gain access to your MonetDB database farm. merovingian is designed to be used in a system initialisation script in production environments.

With merovingian running then you're ready to create a database - for this you use monetdb - and then start the database using the same command for example:

> monetdb create twf
successfully created database 'twf'

> monetdb status
     name        state     
twf            stopped              
  
> monetdb start twf
starting database 'twf'... done

> monetdb status
     name        state     
twf            running

You now have a running database and can connect to it using mclient. This is similar to most command line clients where you can perform changes to your database as well as query for data.

The first step to transferring the database was as usual - inspect the schema on our MySQL database and update it to make the correct use of the supported data types. As with the other systems, there's no support for unsigned values, it also wasn't immediately obvious to me what the maximum length of a varchar is.

With the tables created it was time to try and migrate some data. Given MonetDB has been around for quite a while then there seemed to be pretty scarce resources with any detailed instructions - I couldn't, for example, find any simple migration tools or documentation detailing the best path for migration. I guess this could be because MonetDB is more often tackled by people with bigger brains or with more time to figure things out.

I attempted to use the following to dump data from MySQL:

select * from h into outfile '/dbtmp/tmp/h' fields terminated by "|" enclosed by '"';

And then the following to import into my MonetDB table:

copy 1000000 records into h from '/dbtmp/tmp/h' using delimiters '|','\n', '"'  null as '';

This yielded reasonable results - though I did have to do some tidying up in the middle with sed - in the end I gave up as there were some string values causing me problems, so I decided to rest on it and went to bed!

In the morning I came back to find the merovingian process was dead, and the status of the database was showing as crashed. I started up the processes and took a look at the status - it said the health was 67% so i'm not really sure what's going on with it!

Performance

In the time I had available I was only able to get a 1 million row table imported successfully to play with - a shocking performance I know, but MonetDB was being quite fussy and I wasn't pressing the right buttons! I did run a few tests and also ran them against the same dataset in MySQL for comparison, all are run from cold - i.e. MySQL and MonetDB are both restarted before each query. I don't expect these queries to be representative of real world cases, I was just thinking of some nasty queries that I could throw at a single table in order to cause some pain.

Query 1

MySQL takes 250msec:

sql>select count(*) from h;
+---------+
| L1      |
+=========+
| 1000000 |
+---------+
1 tuple
Timer       1.532 msec 1 rows
Query 2

MySQL takes 420msec:

sql>select count(*) from h group by intcolumn;
+-------+
| L2    |
+=======+
+-------+
65 tuples
Timer     142.260 msec 65 rows
Query 3

MySQL takes 44,000msec:

sql>select count(*) from h group by varcharcolumn;
+-------+
| L1    |
+=======+
+-------+
12743 tuples
Timer    1464.389 msec 12743 rows
Query 4

MySQL takes 37,500msec:

sql>select count(*) as total from h group by varcharcolumn order by total;
+-------+
| L1    |
+=======+
+-------+
12743 tuples
Timer    1496.537 msec 12743 rows
Query 5

MySQL takes 373,000msec (not a typo, it's more than 6 minutes):

sql>select count(*) as total from h group by varcharcolumn, anothervarcharcolumn order by total;
+-------+
| L1    |
+=======+
+-------+
69696 tuples
Timer    4170.520 msec 69696 rows

Summary

Obviously this quick trial of each of these is not comprehensive enough to make any solid comparisons of performance - the next step will be for me to go through and come up with a proper test plan in order to be a little more methodical about things. However, it has given me a good grounding in how the 3 systems compare with respect to installing and getting started. I'll be keeping a close eye on InfiniDB - while not stable enough right now, i'm sure they'll keep things rolling and I look forward to taking another look. If I can overcome the import obstacles and also the different 'feel' of MonetDB then the basic query results make a compelling case for taking a further look - there's also more to learn here with respect to architecture, deployment techniques, monitoring, etc. Finally, Infobright - it would make my life easier if we could use it on an insert/update/delete basis - as it is I think we'd have a tough time getting clients to pay the license fee - perhaps if bundled with something like EC2 instances with a smaller incremental cost then it may be more palletable and help to increase adoption (it may be that Infobright have lots of customers with open wallets - in which case please share them!). In terms of immediate ease of use, with some visible performance improvements, Infobright fits the bill - but until i've had a chance to compare MonetDB and Infobright in a bit more detail then i'll reserve my final judgement!

Discuss this entry

InfiniDB, Infobright and MonetDB - Day 2: Infobright

Day 2 of my tour of column based storage brings me on to Infobright Community Edition (ICE). The first impressive point was that based on my blog post of yesterday then I already had an email from Mark in Community Relations at Infobright offering help and advice - despite me calling him the wrong name (I was having a bad day!) then he was immediately helpful and also offered to get some of his team to look into my queries.

As an aside, John from Calpont was also kind enough to drop by to respond to some of my points - to me this gives me a warm fuzzy feeling that both Infobright and Calpont are taking the community seriously - I guess for these products to gain traction they need to make sure people can get motoring with them to improve adoption.

Read more of this entry

InfiniDB, Infobright and MonetDB - Day 1: InfiniDB

We're taking a whistlestop tour of some of the column based storage systems out there for a project we're working on (where the use case seems to fit better with this form of storage rather than straight MySQL). After reading through the series of articles on the MySQL Performance Blog then we chose to look at InfiniDB, Infobright and MonetDB - with the two that talk MySQL coming first for ease of integration right now. I'm also going to do this as a three parter - so first up is InfiniDB.

Read more of this entry

I need your designer glasses, your blue jeans and your black turtle-neck sweater

Picture of a new MacBook ProOk so it’s not quite Schwarzenegger but last week I terminated a twenty year relationship with Microsoft and bought a Mac. Now I just need to get hold of the Apple uniform and I’ll officially be part of the club!

After more than a year of pontification on what exactly to buy as a replacement for my ageing Dell Inspiron laptop, I finally settled on a shiny new 13” MacBook Pro and an even shinier 24” Cinema Display. So far I’m pretty chuffed with my choice.

Read more of this entry

Protecting your Paperclip downloads

Way back last November when I first blogged about Paperclip I included a brief mention of hiding files behind a controller rather than simply putting them in the public directory for all to see. Since then I’ve noticed that the question of how to actually do this has come up regularly over on Rails Forum and a couple of weeks ago I had to figure out how to update some of our code to protect assets that we had migrated from local file system to Amazon S3 storage. So I figured it’s probably a worthwhile technique to share.

Read more of this entry

Bugmash!

Picture of a masher mashing a bugWell it’s day two of the first ever Rails BugMash and so far I’ve managed to score a sneaky 1,000 points just by updating my one-line binary fixtures test patch. Meanwhile Matt Duncan and Rizwan Reza are on fire with 4,350 and 4,000 points at the time of writing.

Unfortunately the event has coincided with what may be the only nice weekend of the Great British summer, so I’ve been torn between the chance to mash bugs or to enjoy the sunshine. I’m currently trying to combine the two sat out in the garden squinting to see my laptop screen in the glare of the sun!

My next attempt to score some points is an updated patch, now improved and including a test case, for a lack of quoting of aliased table names in SQL joins which has been (too eagerly) marked as resolved even though it’s still broken. If you get the opportunity please do take a look and comment on the ticket as it’d be nice to get it fixed.

After that, I’m hoping to try and sneak my patch for anonymous extension modules for belongs_to and has_one associations into the bugmash as it has been sat on Lighthouse since March and already has three +1s. Even if it isn’t eligible for the bugmash, I still think it’s a worthy patch so again please take a look and comment on the ticket if you get chance.

And of course there are still plenty more tickets tagged with bugmash to be looked at so even if you’ve never contributed to Rails before, now is a pretty good time to start!

Discuss this entry

Thin, Rails 1.2.3/1.2.6 and ActionController::Dispatcher (NameError)

Please note: This patch has now been applied to the Thin master repository so will be fixed in all future releases.

Whilst trying to get an old Rails app up and running with Thin (Gem version 1.2.2) then I encountered a spot of bother:

load_missing_constant: uninitialized constant ActionController::Dispatcher (NameError)

Read more of this entry

Facebook Style Multi Select

Facebook Style Multi SelectWe've got a couple of projects on the go right now where a Facebook style multi-select (with or without autocomplete) is a good fit for making the user interface more, err, useable.

As ever, searching around there are a number of options - especially given the number of Javascript frameworks around. For us, things are also complicated by the fact that not all our projects use the same Javascript libraries - we're not always responsible for the front-end code and as such some clients already have a dependency on, or a preference for a given library.

Read more of this entry

Amazon CloudWatch, Auto Scaling and Elastic Load Balancing

Amazon have been busy again and today announced the release of three new services in their Web Services portfolio: monitoring, scaling and load balancing.

You can read the details in the blog post on the Amazon Web Services Blog.

Read more of this entry

Putting our Running Shoes on for Charity

In a change from our usual bums on seats position, we've decided to take part in a couple of running events this summer.

Read more of this entry

Using SQLite3 and MySQL on Windows

Way back in 2007, Rails 2.0.2 changed the default database from MySQL to SQLite3, making it easier for newcomers to get up and running with Rails quickly without having to worry about setting up a database. Unfortunately for Windows users, trying to install the SQLite3 gem with gem install sqlite3-ruby will result in the following error:

Building native extensions.  This could take a while…
ERROR:  Error installing sqlite3-ruby:
        ERROR: Failed to build gem native extension.

Read more of this entry

Archives

  1. July 2010
  2. April 2010
  3. January 2010
  4. November 2009
  5. September 2009
  6. August 2009
  7. July 2009
  8. June 2009
  9. May 2009
  10. April 2009
  11. March 2009
  12. February 2009
  13. December 2008
  14. November 2008
  15. September 2008
  16. August 2008
  17. July 2008
  18. June 2008
  19. May 2008
  20. April 2008
  21. March 2008
  22. February 2008
  23. January 2008
  24. December 2007
  25. November 2007

Tags

  1. actioncontrollerdispatcher (nameerror)
  2. actionview
  3. active messaging
  4. activerecord
  5. activesupport
  6. actverecord
  7. aes
  8. aggregation
  9. ajax
  10. akismet
  11. amazon
  12. amazon sqs
  13. ami
  14. apache
  15. api
  16. apple
  17. apr
  18. apr-util
  19. async
  20. attachments
  21. attachment_fu
  22. attr_accessible
  23. auto scaling
  24. autotest
  25. availability
  26. aws
  27. backgroundrb
  28. beanstalkd
  29. bindings
  30. bj
  31. block
  32. branding
  33. buckets
  34. bug
  35. bugmash
  36. cache
  37. caching
  38. callbacks
  39. cancer research uk
  40. cdn
  41. centos
  42. charity
  43. cloud
  44. cloudfront
  45. clusters
  46. column information
  47. columns
  48. community
  49. company name
  50. compatibility
  51. compiler
  52. composed_of
  53. consultancy
  54. content
  55. content delivery
  56. controller
  57. convert
  58. cookies
  59. csrf
  60. css
  61. data warehouse
  62. database
  63. dates
  64. defensio
  65. deployment
  66. design
  67. development
  68. dhtml
  69. docrails
  70. documentation
  71. donations
  72. drdb
  73. duplicate key
  74. ebs
  75. ec2
  76. elastic
  77. elastic block store
  78. elastic load balancing
  79. encoding
  80. encryption
  81. erb
  82. error
  83. european
  84. events
  85. ezcrypto
  86. facebook
  87. fckeditor
  88. feedburner
  89. feeds
  90. ffmpeg
  91. filter
  92. fixes
  93. flash
  94. flickr
  95. flickr api
  96. flickr_fu
  97. fuse
  98. geekup
  99. gems
  100. geocode
  101. git
  102. github
  103. god
  104. great south run
  105. greenplum
  106. growl
  107. hacker
  108. haml
  109. haproxy
  110. helper
  111. hmac
  112. holiday
  113. hooks
  114. hosting
  115. howto
  116. hpricot
  117. html
  118. identity
  119. imagemagick
  120. imagescience
  121. infinidb
  122. infiniteftp
  123. infobright
  124. init.d
  125. insert
  126. invalid authenticity token
  127. italy
  128. javascript
  129. jquery
  130. json
  131. leeds media
  132. limit
  133. linux
  134. load balancing
  135. logo
  136. mac
  137. markaby
  138. mass-assignment
  139. memcached
  140. mephisto
  141. messaging
  142. middleware
  143. migrate
  144. migration
  145. model
  146. mod_rails
  147. mod_ruby
  148. monetdb
  149. mongrel
  150. mongrel_cluster
  151. monit
  152. monitoring
  153. mootools
  154. mp3
  155. mq
  156. multiple gems
  157. multiselect
  158. mysql
  159. neon
  160. new site
  161. nginx
  162. observer
  163. offset
  164. open source
  165. opensolaris
  166. openssl
  167. optimisation
  168. paperclip
  169. parameters
  170. params
  171. passenger
  172. patch
  173. performance
  174. permanentredirect
  175. persistence
  176. persistent storage
  177. persistentfs
  178. php
  179. phusion
  180. plugin
  181. plugins
  182. post commit
  183. post-commit
  184. protomultiselect
  185. prototype
  186. query
  187. queues
  188. quotes
  189. race for life
  190. rack
  191. rails
  192. rails development
  193. rails patch
  194. rails plugin
  195. rails-doc
  196. rake
  197. refresh
  198. renderer
  199. respond_to
  200. rich text editor
  201. rmagick
  202. ruby
  203. ruby on rails
  204. rubyinline
  205. running
  206. rvideo
  207. s3
  208. s3fs. elasticdrive
  209. scaling
  210. schema
  211. schwarzenegger
  212. scm
  213. security
  214. services
  215. session
  216. shorthand
  217. snarl
  218. social
  219. solaris
  220. spam filter
  221. sparrow
  222. specify
  223. sponsorship
  224. sql
  225. sqlite3
  226. sql_logging
  227. starling
  228. starter kit
  229. storage
  230. streaming
  231. subversion
  232. sue ryder care
  233. survey
  234. svn
  235. swfupload
  236. swig
  237. sysadmin
  238. tables
  239. tamper
  240. templates
  241. the webfellas
  242. thewebfellas
  243. thin
  244. thumbnail
  245. time zone
  246. tinymce
  247. tip
  248. tips
  249. to-done
  250. training
  251. transcoding
  252. twitter
  253. tzinfo
  254. uk
  255. uk rails
  256. unsigned
  257. update
  258. uploads
  259. url
  260. validation
  261. version
  262. video
  263. view
  264. vmdk
  265. vmware
  266. webfellas
  267. webfellows
  268. wedding
  269. welcome
  270. widgeditor
  271. will_paginate
  272. win32
  273. windows
  274. wysiwyg
  275. xen
  276. xhtml
  277. xvm
  278. youtube
  279. zenoss
  280. zentest
  281. zfs

Flickr snaps