Pages

Tuesday 15 May 2012

Thinking Sphinx for Mighty tables

Recently i was assigned to integrate thinking sphinx in a messaging application and the messaging table was having more than 20 Million records.

Though i have configured thinking sphinx in different project so the estimation was that it will take some couple of hours to add it on server but this time i was wrong Because ...

In thinking sphinx the query we used to add delta filed is some thing like this

alter table table_name add column delta boolean default true

But when ever you run the rake command to build the indexes it will first execute the following query

update table_name set detla = false where delta = true


Wow what a beautiful query as in my case this query takes around 4 hours to complete and during this time whole table was locked and server remains down.

why this happens ?

Some projects nature is to introduce search in very initial level or with very little number of records in table so the second query runs very fast and after its completion indexing process starts. As the all db records are being getting qualified for where clause in second query and its going to update all records again so table gets lock.

The solution for this problem is to alter all records delta value back to false first in chunks, then run the rake task to index all records and then after wards index your records.

some thing like this

TableModel.find_in_batches do |records|
  TableModel.update_all {:delta => true}, ["id in (?)", records.map(&:id)]
end

and then afterwards running

rake thinking_sphinx:index

cheer :)