Daily Archives: November 29, 2008

Wow – An Index really matters

I am doing some spidering using Python and SQLite 3 – I made a process to allow my spider to be restarted by retrieving and storing spidered material i the database – to make sure that I could restart this – I was doing the following select to find a url to make sure I did not re-retrieve:
select text from revisions where url = ? limit 0,1
But after there were about 50000 pages – this started to slow down a lot. It was doing a full scan. So I stopped the process and added an index:
create unique index revidx on revisions (url) ;
Wow – it is so much faster. Nothing like a ton of data to remind you of the speed difference between Order(N) and Order(Log N).
I am liking Python and its built in support for Sqlite3. Nice.