strapyourself.in and flouri.sh
Writing data migrations in rails
Using ActiveRecord can be tricky, here's how
Data migrations are a major headache in every rails project I've worked on. Developers typically write straight SQL migrations which take much longer to create and test, or they use ActiveRecord and run into problems. I recent wrote a data migration which took 7 existing tables and compressed them into 4. Some of the new 4 had the same names as the existing models, so I finally figured out how to do this safely with ActiveRecord. Here's my advice:
- When changing the schema of an existing set of tables, create new tables and then rename them.
- Include active record fragments inside the migration class at the top of your migrations.
- Put your data migrations inside transactions.
- Make your data migrations completely reversible (which is a lot easier when you follow the first rule)
Here's an example of an ActiveRecord class fragment at the top of a migration:
class MarkPrimaryBits < ActiveRecord::Migration class LessonVersion < ActiveRecord::Base belongs_to :lesson end class Lesson < ActiveRecord::Base has_many :lesson_versions end def self.up ...
By putting the Lesson and LessonVersion classes at the top of the migration, I'm allowing myself to use those classes inside my migration and be completely independent of any changes made to the real model from then on (including the deletion of the class itself). Furthermore, I can have another migration which uses those same class names with completely different meanings and they won't conflict with each other.
Extending has_many associations correctly
How to build custom methods on associations that aren't slow
The "has_many do" syntax has been widely adopted in rails, but I often see it going wrong. Consider the following association extension:
has_many :versions do def primary find(:first, :conditions => {:primary => true}) end end
What's wrong with this code is that the finder has to execute a database request every time it's invoked. True, rails has a query cache, but it results in more database requests then one should need. We can make it better by caching the result in an instance variable attached to the association:
has_many :versions do def primary @primary ||= find(:first, :conditions => {:primary => true}) end end
But what if you already have the association loaded? Inside the block, you can access various methods of the AssociationCollection and AssociationProxy classes. Notable methods are the following:
- proxy_owner - the module that contains the association
- proxy_reflection - the Reflection object that contains the association options (FK, :dependent, etc)
- proxy_target - the cached association data, if the association has been loaded
- loaded? - returns true if the association has been loaded
- reset - delete the cached association data and forget it has been loaded
Using these methods, we can rewrite our method to be the most efficient possible, by making use of loaded association data when present:
has_many :versions do def primary if loaded? @primary ||= proxy_target.detect { |ver| ver.primary? } else @primary ||= find(:first, :conditions => {:primary => true}) end end end
Now our method is guarenteed to make only 1 database call, no matter how many times invoked. It also will make zero database calls if the entire association has already been loaded. The only downside is that "how to determine if primary" logic has to be written once in rails_sql and once in pure ruby.