Those of you using ferret 0.11.6 (the latest released gem) and acts_as_ferret 0.4.3 (the latest stable version) may have noticed that rebuilding an index can be painfully slow when working with a large number of documents. Even if each document contains a relatively small amount of text, indexing crawls with a large set of documents. The problem is a result of how bulk update works; “bulk indexing” processes a single document at a time! Fortunately, there is a simple patch which will provide a significant speed boost.
There is a fairly old trac ticket where Francois Lagunas posted a clever patch which will make bulk indexing process documents as a group. Here is a monkey patch based on what he submitted as a patch (in Rails, just drop this as a file into config/initializers).
class Ferret::Index::Index
def update_batch(docs)
@dir.synchrolock do
ensure_writer_open()
commit = false
docs.each do |id, value|
delete(id)
commit = true if id.is_a?(String) or id.is_a?(Symbol)
end
if commit
@writer.commit
end
ensure_writer_open()
docs.each do |id, new_doc|
@writer << new_doc
end
flush() if @auto_flush
end
end
end
class ActsAsFerret::BulkIndexer
def index_records(records, offset)
docs = {}
batch_time = measure_time {
records.each { |rec| docs[rec.id] = rec.to_doc if rec.ferret_enabled?(true) }
@index.update_batch(docs)
}.to_f
@work_done = offset.to_f / @model_count * 100.0 if @model_count > 0
remaining_time = ( batch_time / @batch_size ) * ( @model_count - offset + @batch_size )
@logger.info "#{@reindex ? 're' : 'bulk '}index model #{@model.name} : #{'%.2f' % @work_done}% complete : #{'%.2f' % remaining_time} secs to finish"
end
endIf you are using a newer version of ferret by building the gem yourself, the ferret side of this patch is already included (although, you do need to make a slight change on the acts_as_ferret side). Stay tuned for another post about how to do this.


great work!