Reading Settings

Font Size

18px

Line Height

1.5

Letter Spacing

0.01em

Font Family

Table of contents

Optimizing Bulk Create Operations in Rails

Software Engineer

Ruby on Rails

2026-01-03 19:48:24 +0700 · 4 mins read

Recently, my team ran into performance issues while handling bulk CUD (Create, Update, Delete) operations in a Rails application.

Active Record provides validations and callbacks, which are excellent tools for maintaining data consistency. Rails controllers also offer convenient create, update, and destroy actions that work very well for single-record workflows. However, in most applications where users manage tabular data, bulk actions are unavoidable. Users upload CSV files, select multiple rows, or trigger batch actions in a single request. In these cases, Active Record’s per-record lifecycle can quickly become a performance bottleneck.

In this post, I focus only on bulk create. I will cover update and delete in separate posts.

Example: Why Bulk Create Is Expensive

Consider the following models:

// language: ruby
class Tag < ApplicationRecord
end

class Office < ApplicationRecord
end

class Book < ApplicationRecord
  validates :name, length: { maximum: 200 }, presence: true, uniqueness: true
  validates :tags, presence: true
  validates :office, presence: true

  validate :office_must_be_active

  has_many :book_tags
  has_many :tags, through: :book_tags
  belongs_to :office

  after_create :sync_search_index

  private

  def office_must_be_active
    return if office.nil?
    errors.add(:office, "is inactive") unless office.active?
  end

  def sync_search_index
    BookSearchIndexer.perform(id: id)
  end
end

Creating a single record works fine:

// language: ruby
Book.create(name: "Math Basic", office_id: 100, tag_ids: [10, 12])

Now imagine importing 100 books from a CSV file:

// language: ruby
CSV.foreach("books.csv", headers: true) do |row|
  Book.create(
    name: row["name"],
    office_id: row["office_id"],
    tag_ids: row["tag_ids"].split(",")
  )
end

For each book, Active Record may trigger:

One SELECT for name uniqueness
One SELECT to load the office for presence and custom validation
One or more SELECTs for tags
One INSERT into books
Multiple INSERTs into book_tags
All callbacks, including BookSearchIndexer.perform

Multiply this by 100 rows, and the cost becomes obvious.

activerecord-import Is Fast, But Not Always a Fit

The activerecord-import gem is often suggested for bulk inserts. It is fast and uses bulk SQL under the hood.

// language: ruby
books = CSV.read("books.csv", headers: true).map do |row|
  Book.new(
    name: row["name"],
    office_id: row["office_id"],
    tag_ids: row["tag_ids"].split(",")
  )
end

Book.import(books, validate: true)

It relies on INSERT ALL for performance and allows validations and callbacks to be optional. While this is fast, it is often a poor fit for public APIs or user-uploaded data, where correctness, predictable behavior, and clear error reporting matter more than raw speed.

In most real-world cases, we still need validations and callbacks.

Optimized Bulk Create Strategy

The idea is simple: move expensive work out of the per-record lifecycle and do it once, in batches.

Core principles:

Validate data in batches upfront
Avoid repeated queries per record
Keep database constraints as the final safety net
Skip expensive per-record validations and callbacks during bulk inserts

1. Always Start With Database Constraints

Database constraints are non-negotiable. They are the final line of defense.

// language: ruby
add_index :books, :name, unique: true
add_foreign_key :books, :offices
add_foreign_key :book_tags, :books
add_foreign_key :book_tags, :tags

Why this matters:

Protects data even if validations are skipped
Handles race conditions correctly
Covers bugs, direct SQL, or missed checks

2. Move Side Effects to Background Jobs

In the example above, BookSearchIndexer.perform might call Elasticsearch or another external service. This is expensive and unnecessary during bulk inserts. If indexing is not required immediately, move it to a background job:

// language: ruby
BookSearchIndexer.perform_later(id: id)

For bulk operations, it is often better to index once for all records instead of once per record.

3. Make Expensive Validations and Callbacks Skippable

Some validations are expensive when repeated many times, such as uniqueness checks or external side effects. Make them conditional:

// language: ruby
class Book < ApplicationRecord
  attr_accessor :skip_name_uniqueness, :skip_sync_search_index

  validates :name, presence: true
  validates :name, uniqueness: true, unless: :skip_name_uniqueness

  validate :office_must_be_active

  belongs_to :office
  has_many :tags

  after_create :sync_search_index, unless: :skip_sync_search_index

  private

  def office_must_be_active
    errors.add(:office, "is inactive") unless office&.active?
  end

  def sync_search_index
    BookSearchIndexer.perform_later(id: id)
  end
end

Rule of thumb:

Single create: run all validations and callbacks
Bulk create: validate once upfront, skip per-record expensive checks and run bulk callbacks

4. Validate in Batches, Not Per Record

Instead of validating each row individually:

One query to check existing book names
One query to load all offices
One query to load all tags

This turns N×queries into a small, fixed number of queries and allows you to return all errors at once.

5. Pass Objects, Not IDs

Passing IDs often triggers hidden SELECTs during validation (e.g, office_must_be_active).

// language: ruby
# Good
Book.new(name: name, office: office, tags: tags)

# Bad
Book.new(name: name, office_id: office_id, tag_ids: tag_ids)

Passing objects avoids redundant lookups and makes validation behavior predictable.

BulkBookImporter Service Example

// language: ruby
class BulkBookImporter
  class ImportError < StandardError; end

  def initialize(rows:)
    @rows = rows
    @office_ids = rows.pluck(:office_id).uniq
    @tag_ids    = rows.pluck(:tag_ids).flatten.uniq
  end

  # All or nothing: if one row fails, nothing is created
  def perform
    raise ImportError, "No data provided" if @rows.empty?

    validate_offices!
    validate_tags!
    validate_uniqueness!

    created_books = []

    ActiveRecord::Base.transaction do
      @rows.each do |row|
        created_books << create_book!(row)
      end
    end

    sync_search_index!(created_books)
    created_books
  end

  private

  # ---------- Preloading ----------

  def offices_by_id
    @offices_by_id ||= Office.where(id: @office_ids).index_by(&:id)
  end

  def tags_by_id
    @tags_by_id ||= Tag.where(id: @tag_ids).index_by(&:id)
  end

  # ---------- Batch validations ----------

  def validate_offices!
    if offices_by_id.size != @office_ids.size
      raise ImportError, "One or more offices are missing"
    end
  end

  def validate_tags!
    if tags_by_id.size != @tag_ids.size
      raise ImportError, "One or more tags are missing"
    end
  end

  def validate_uniqueness!
    names = @rows.pluck(:name)

    # Duplicates inside the input
    if names.uniq.size != names.size
      raise ImportError, "Duplicate names in input"
    end

    # Duplicates against existing data
    if Book.exists?(name: names)
      raise ImportError, "One or more names already exist"
    end
  end

  # ---------- Creation ----------

  def create_book!(row)
    book = Book.new(
      name:   row[:name],
      office: offices_by_id[row[:office_id]],
      tags:   row[:tag_ids].map { |id| tags_by_id[id] }
    )

    # Skip expensive per-record work
    book.skip_name_uniqueness     = true
    book.skip_sync_search_index   = true

    book.save!
    book
  rescue ActiveRecord::RecordNotUnique
    # Final safety net for race conditions
    raise ImportError, "Name already taken"
  end

  # ---------- Side effects ----------

  def sync_search_index!(books)
    return if books.empty?

    BookSearchIndexer.perform_later(ids: books.map(&:id))
  end
end

Conclusion

Bulk operations in Rails do not require abandoning Active Record or blindly reaching for external gems. By understanding what validations and callbacks actually do under the hood, and by moving expensive work into batch operations, you can achieve significant performance improvements while still maintaining data correctness

Related blogs

Hello Golang: My First Steps to the Language

I’ve worked with Ruby in several projects, which is defined as "a programmer’s best friend" for its clean and English-like syntax. While my back-end experience is rooted in the Ruby on Rails framework, I prefer TypeScript for building CLI tools and s...

Software Engineer

2025-10-18 19:05:21 +0700 · 6 mins read

Make Our Utils Functions Immutable

I chose JavaScript for this blog because it has both mutable and immutable objects and methods,  while language like Haskell enforces immutability everywhere1. Mutable vs Immutable Objects and MethodsWhen we say an object is immutable, we mean i...

Software Engineer

2025-11-22 18:12:27 +0700 · 3 mins read