Categories
articles code

Adding ManyToManyField values to SearchVectorField for full-text search in Django

Summary

Get the related values joined in a string and pass it on as a Value() expression in the SearchVector object:

class MainModel(models.Model):
    (...)
    title = models.CharField(blank=True)
    tags = models.ManyToManyField('TagModel', blank=True)
    search_vector = SearchVectorField(null=True)
    def update_search_vector(self):
        tag_values = ' '.join(self.related.values_list('field', flat=True)
        self.search_vector = (
            SearchVector('title', weight='A', config='english') +
            SearchVector(Value(tag_values), weight='B', config='english'))
        )
class TagModel(models.Model):
    (...)

Cifonauta’s full-text search

During the past year, we’ve been working on several updates for modernizing the Cifonauta database. The website now has a fully responsive design and a new dashboard for contributed user uploads. The new version will be live probably later this year.

As part of this effort, I needed to update our full-text backend. Cifonauta is a thirteen year-old Django/PostgreSQL-based application. In ancient times, the search was powered by ElasticSearch. It worked fine, but brought a heavy maintenance burden (for me). As Django’s full-text search capabilities became more established, I migrated to a built-in search setup using the great SearchVector.

Dynamic SearchVector

Initially, I was populating the search vector dynamically in a view, adding data from the plain CharField and TextField of our model, as well as data from different ManyToManyField and ForeignKey related models. For the latter to work, I had to use the StringAgg function to concatenate the “name” field from the related models:

def search_view(request):
    (...)
    query = request.GET.get('query', '').strip()
        if query:
            vector = SearchVector('title', weight='A') + \
                     SearchVector('caption', weight='A') + \
                     SearchVector(StringAgg('person__name', delimiter=' '), weight='A') + \
                     SearchVector(StringAgg('tag__name', delimiter=' '), weight='B') + \
                     SearchVector(StringAgg('sublocation__name', delimiter=' '), weight='B') + \
                     SearchVector(StringAgg('city__name', delimiter=' '), weight='B') + \
                     SearchVector(StringAgg('state__name', delimiter=' '), weight='B') + \
                     SearchVector(StringAgg('country__name', delimiter=' '), weight='B')
    media_list = media_list.annotate(search=vector).filter(search=query)
    (...)

Although this worked, searching was painfully slow.

Static SearchVectorField

To improve search performance, I created a SearchVectorField in our model to store all the indexable information. This search vector field is updated every time the object is saved, via a post_save signal that calls the model’s update_search_vector() method. It is also indexed with GinIndex in the model’s Meta to speed up the searches:

class Meta:
    indexes = (GinIndex(fields=['search_vector']),)

ManyToManyField problem

However, I ran into a problem. While it worked fine for CharField, the StringAgg approach for ManyToManyField does not work and throws this exception:

django.core.exceptions.FieldError: Aggregate functions are not allowed in this query

And other exceptions when I tried something else (and I tried many things), like:

django.core.exceptions.FieldError: Joined field references are not permitted in this query

Value() solution

After some struggling, the solution was simpler and more obvious than I thought. You just need to fetch and pass the values as a single string wrapped in the Value() expression, like this:

def update_search_vector(self):
    authors = ' '.join(self.authors.values_list('name', flat=True))
    taxa = ' '.join(self.taxa.values_list('name', flat=True))
    self.search_vector = (
        SearchVector(Value(authors), weight='B', config='english') +
        SearchVector(Value(taxa), weight='B', config='english') +
    )

Good performance–maintenance ratio

Now everything works as expected. I can search for “bruno larva” and get relevant results, relatively fast.

Cifonauta's full-text search.

2 replies on “Adding ManyToManyField values to SearchVectorField for full-text search in Django”

Hi! This is actually a post on my blog that is being federated to the Fediverse using the WordPress ActivityPub plugin. That’s why all the formatting works. As far as I know, Mastodon supports showing incoming rich text, but not creating formatted content (that’s by design, I believe).

Reply by Email

or

Leave a Comment

Your email address will not be published. Required fields are marked *