The goal of this MR is to solve the search issue #96. Let's assume we have a user with firstname
Du Pont and nickname
Ai'gnan. Here is a list of search that did not include him previously but now includes him (was and still is case-insensitive):
jean françois(missing -) ;
jean-francois(missing ç) ;
jean francois(both) ;
françois(not the start of his name) ;
You get it, there are a lot of mistakes that humans can do. It also sorts results by
User.last_update to avoid putting old accounts at the top of common requests (such as firstname-only or lastname-only requests).
How it works
For those who don't know, the search is handled by Xapian (the search backend) through the haystack library which provides a Django-friendly interface to multiple search backends. Xapian maintains kind of a duplicate of the database (only for models against which we want to search something) which is optimised for search operations. Its "models" are called "indexes" (see
core.search_indexes.UserIndex for the user model).
Every time a user is created or modified, it is indexed (through a signal handler) so that Xapian knows about it. For the user search, what is indexed is the string outputted by the
core/templates/search/indexes/core/user_auto.txt template. For our example from above, it looks like this:
jean francois du pont aignan jeanfrancois dupont jeanfrancoisdupont
As you can see, unicode is removed. There also are kind-of duplicates with different spacing as we are using an autocomplete algorithm: it searches from the beginning of words.
The one I am not sure about is the last one. Its goal is to allow searching without putting a space between the firstname and lastname. Is this useful?
The prod will have to do a
./manage.py update_index, not sure it does it in the upgrade script.