ParsaLabs | Blog

A publication about the web and more.

Batch Processing Database Records

| Comments

Using the all method to loop through a large collection of records from database is very inefficient because it will try to instantiate all of the records at once. In large data sets this will consume a lot of memory. The solution is to use one of the batch processing methods in Rails:

So, instead of doing this:

1
2
3
User.all.each do |user|
  user.do_sth
end

…Use Batch Processing methods, like so:

1
2
3
User.find_each(batch_size: 5000) do |user| #by default batch size is 1000
  user.do_sth
end

Of course you can also chain it to other query methods such as .where().

Other options you can pass to .find_each() are start and end_at; to configure the first and last ID (primary key) of the sequence:

1
2
3
User.find_each(start: 0, end_at: 10000, batch_size: 500) do |user|
  user.do_sth
end

This is particularly useful if (for instance) you need worker 1 to handle records between 0 and 10,000 & worker 2 to handle from 10,000 and beyond.

Comments