0%
Reading Settings
Font Size
18px
Line Height
1.5
Letter Spacing
0.01em
Font Family
Table of contents
Optimizing Bulk Create Operations in Rails
Software Engineer
Software Engineer
Ruby on Rails
Ruby on Rails
Recently, my team ran into performance issues while handling bulk CUD (Create, Update, Delete) operations in a Rails application.
Active Record provides validations and callbacks, which are excellent tools for maintaining data consistency. Rails controllers also offer convenient create, update, and destroy actions that work very well for single-record workflows. However, in most applications where users manage tabular data, bulk actions are unavoidable. Users upload CSV files, select multiple rows, or trigger batch actions in a single request. In these cases, Active Record’s per-record lifecycle can quickly become a performance bottleneck.
In this post, I focus only on bulk create. I will cover update and delete in separate posts.
Example: Why Bulk Create Is Expensive
Consider the following models:
// language: ruby
class Tag < ApplicationRecord
end
class Office < ApplicationRecord
end
class Book < ApplicationRecord
validates :name, length: { maximum: 200 }, presence: true, uniqueness: true
validates :tags, presence: true
validates :office, presence: true
validate :office_must_be_active
has_many :book_tags
has_many :tags, through: :book_tags
belongs_to :office
after_create :sync_search_index
private
def office_must_be_active
return if office.nil?
errors.add(:office, "is inactive") unless office.active?
end
def sync_search_index
BookSearchIndexer.perform(id: id)
end
end
Creating a single record works fine:
// language: ruby Book.create(name: "Math Basic", office_id: 100, tag_ids: [10, 12])
Now imagine importing 100 books from a CSV file:
// language: ruby
CSV.foreach("books.csv", headers: true) do |row|
Book.create(
name: row["name"],
office_id: row["office_id"],
tag_ids: row["tag_ids"].split(",")
)
endFor each book, Active Record may trigger:
- One SELECT for name uniqueness
- One SELECT to load the office for presence and custom validation
- One or more SELECTs for tags
- One INSERT into books
- Multiple INSERTs into book_tags
- All callbacks, including
BookSearchIndexer.perform
Multiply this by 100 rows, and the cost becomes obvious.
activerecord-import Is Fast, But Not Always a Fit
The activerecord-import gem is often suggested for bulk inserts. It is fast and uses bulk SQL under the hood.
// language: ruby
books = CSV.read("books.csv", headers: true).map do |row|
Book.new(
name: row["name"],
office_id: row["office_id"],
tag_ids: row["tag_ids"].split(",")
)
end
Book.import(books, validate: true)It relies on INSERT ALL for performance and allows validations and callbacks to be optional. While this is fast, it is often a poor fit for public APIs or user-uploaded data, where correctness, predictable behavior, and clear error reporting matter more than raw speed.
In most real-world cases, we still need validations and callbacks.
Optimized Bulk Create Strategy
The idea is simple: move expensive work out of the per-record lifecycle and do it once, in batches.
Core principles:
- Validate data in batches upfront
- Avoid repeated queries per record
- Keep database constraints as the final safety net
- Skip expensive per-record validations and callbacks during bulk inserts
1. Always Start With Database Constraints
Database constraints are non-negotiable. They are the final line of defense.
// language: ruby add_index :books, :name, unique: true add_foreign_key :books, :offices add_foreign_key :book_tags, :books add_foreign_key :book_tags, :tags
Why this matters:
- Protects data even if validations are skipped
- Handles race conditions correctly
- Covers bugs, direct SQL, or missed checks
2. Move Side Effects to Background Jobs
In the example above, BookSearchIndexer.perform might call Elasticsearch or another external service. This is expensive and unnecessary during bulk inserts. If indexing is not required immediately, move it to a background job:
// language: ruby BookSearchIndexer.perform_later(id: id)
For bulk operations, it is often better to index once for all records instead of once per record.
3. Make Expensive Validations and Callbacks Skippable
Some validations are expensive when repeated many times, such as uniqueness checks or external side effects. Make them conditional:
// language: ruby
class Book < ApplicationRecord
attr_accessor :skip_name_uniqueness, :skip_sync_search_index
validates :name, presence: true
validates :name, uniqueness: true, unless: :skip_name_uniqueness
validate :office_must_be_active
belongs_to :office
has_many :tags
after_create :sync_search_index, unless: :skip_sync_search_index
private
def office_must_be_active
errors.add(:office, "is inactive") unless office&.active?
end
def sync_search_index
BookSearchIndexer.perform_later(id: id)
end
endRule of thumb:
- Single create: run all validations and callbacks
- Bulk create: validate once upfront, skip per-record expensive checks and run bulk callbacks
4. Validate in Batches, Not Per Record
Instead of validating each row individually:
- One query to check existing book names
- One query to load all offices
- One query to load all tags
This turns N×queries into a small, fixed number of queries and allows you to return all errors at once.
5. Pass Objects, Not IDs
Passing IDs often triggers hidden SELECTs during validation (e.g, office_must_be_active ).
// language: ruby # Good Book.new(name: name, office: office, tags: tags) # Bad Book.new(name: name, office_id: office_id, tag_ids: tag_ids)
Passing objects avoids redundant lookups and makes validation behavior predictable.
BulkBookImporter Service Example
// language: ruby
class BulkBookImporter
class ImportError < StandardError; end
def initialize(rows:)
@rows = rows
@office_ids = rows.pluck(:office_id).uniq
@tag_ids = rows.pluck(:tag_ids).flatten.uniq
end
# All or nothing: if one row fails, nothing is created
def perform
raise ImportError, "No data provided" if @rows.empty?
validate_offices!
validate_tags!
validate_uniqueness!
created_books = []
ActiveRecord::Base.transaction do
@rows.each do |row|
created_books << create_book!(row)
end
end
sync_search_index!(created_books)
created_books
end
private
# ---------- Preloading ----------
def offices_by_id
@offices_by_id ||= Office.where(id: @office_ids).index_by(&:id)
end
def tags_by_id
@tags_by_id ||= Tag.where(id: @tag_ids).index_by(&:id)
end
# ---------- Batch validations ----------
def validate_offices!
if offices_by_id.size != @office_ids.size
raise ImportError, "One or more offices are missing"
end
end
def validate_tags!
if tags_by_id.size != @tag_ids.size
raise ImportError, "One or more tags are missing"
end
end
def validate_uniqueness!
names = @rows.pluck(:name)
# Duplicates inside the input
if names.uniq.size != names.size
raise ImportError, "Duplicate names in input"
end
# Duplicates against existing data
if Book.exists?(name: names)
raise ImportError, "One or more names already exist"
end
end
# ---------- Creation ----------
def create_book!(row)
book = Book.new(
name: row[:name],
office: offices_by_id[row[:office_id]],
tags: row[:tag_ids].map { |id| tags_by_id[id] }
)
# Skip expensive per-record work
book.skip_name_uniqueness = true
book.skip_sync_search_index = true
book.save!
book
rescue ActiveRecord::RecordNotUnique
# Final safety net for race conditions
raise ImportError, "Name already taken"
end
# ---------- Side effects ----------
def sync_search_index!(books)
return if books.empty?
BookSearchIndexer.perform_later(ids: books.map(&:id))
end
endConclusion
Bulk operations in Rails do not require abandoning Active Record or blindly reaching for external gems. By understanding what validations and callbacks actually do under the hood, and by moving expensive work into batch operations, you can achieve significant performance improvements while still maintaining data correctness
Related blogs
Hello Golang: My First Steps to the Language
I’ve worked with Ruby in several projects, which is defined as "a programmer’s best friend" for its clean and English-like syntax. While my back-end experience is rooted in the Ruby on Rails framework, I prefer TypeScript for building CLI tools and s...
Software Engineer
Software Engineer
Make Our Utils Functions Immutable
I chose JavaScript for this blog because it has both mutable and immutable objects and methods, while language like Haskell enforces immutability everywhere1. Mutable vs Immutable Objects and MethodsWhen we say an object is immutable, we mean i...
Software Engineer
Software Engineer