Implement Dovecot Full-Text Indexing

Submitted by Tom Thorp on Tuesday, September 24, 2024 - 15:05
Modified on Sunday, October 6, 2024 - 13:06
Xapian logo

I've been self-managing my own email infrastructure for well over three years. During this time, I have noticed the mailbox size has grown substantially, to the point where I now have difficulty retrieving emails from many years past through my email clients. The solution is to implement Xapian's Full-Text Search Index on each of the mailboxes. 

 

Why Xapian?

My decision to choose Xapian as my choice for Full-Text Search Index, is for a number of reasons. With Dovecot v2.3, Dovecot made the decision to deprecate Lucene and Squat FTS engines in favour of Xapian's open source library. Dovecot (at time of writing) maintains support of fts-flatcurve. However, fts-flatcurve is only available with Dovecot's 'FTS stemming' feature, a feature that's only available in the Dovecot Pro FTS Engine (part of OX Dovecot Pro). So rather than adding an additional layer to my server ( ie. using Apache Solr and JRE ), I went with the community developed Xapian plugin ( fts-xapian ). Community plugins fts-elastic and fts-elasticsearch relied on deprecated code from Dovecot's fts-squat. 

 

Installing the Xapian Plugin

To install the Xapian plugin on Fedora, there are a number of prerequisites. In your terminal type:

[fedora@ns ~]$ sudo dnf install sqlite-devel libicu-devel xapian-core-devel
[fedora@ns ~]$ sudo dnf install dovecot-devel git

If you intend on indexing the contents of your email attachments, then there are additional dependencies that have to be installed :

[fedora@ns ~]$ sudo dnf install libxml2 unzip poppler-utils catdoc

Clone the project :

[fedora@ns ~]$ git clone https://github.com/grosjo/fts-xapian
[fedora@ns ~]$ cd fts-xapian

Compile and install the project :

[fedora@ns fts-xapian]$ autoupdate
[fedora@ns fts-xapian]$ autoreconf -vi
[fedora@ns fts-xapian]$ ./configure --with-dovecot=/path/to/dovecot

If you get a "syntax error near unexpected token 'PKG_CHECK_MODULES' " error, run autoreconf pointing to the directory where the package module file "pkg.m4" is located. Thus:

[fedora@ns fts-xapian]$ autoreconf -vi -I /usr/local/aclocal
[fedora@ns fts-xapian]$ ./configure --with-dovecot=/path/to/dovecot

Replace /path/to/dovecot by the actual path to 'dovecot-config'. On Fedora, it is /usr/lib64/dovecot.

Make and install Xapian plugin

[fedora@ns fts-xapian]$ make
[fedora@ns fts-xapian]$ sudo make install

For specific configuration, you may have to 'export PKG_CONFIG_PATH=...'. To check that, type 'pkg-config --cflags-only-I icu-uc icu-io icu-i18n', it shall return no error.

The module will be placed into the module directory of your dovecot configuration.

 

Configure Dovecot

Update your Dovecot configuration with something similar to :

conf.d/10-mail.conf

.
.
mail_plugins = fts fts_xapian

conf.d/10-master.conf

.
.
service indexer-worker {
  vsz_limit = 5GB   # Max memory allocated for indexing mailboxes
}
# NOTE : If you require indexing of attachments, uncomment this service.
# service decode2text {
#     executable = script /usr/libexec/dovecot/decode2text.sh
#     user = dovecot
#     unix_listener decode2text {
#        mode = 0666
#     }
# }
  

conf.d/90-plugins.conf

plugin {
  fts = xapian
  fts_xapian = partial=3 full=20 verbose=0
  fts_autoindex = yes
  fts_enforced = yes
  fts_autoindex_exclude = \Trash
  fts_autoindex_exclude2 = \Junk
  
# Un-comment if indexing attachments
#   fts_decoder = decode2text  
}

Once completed, verify that there are no problems with the configuration by running the following commands :

[fedora@ns fts-xapian]$ sudo doveconf 1>/dev/null
[fedora@ns fts-xapian]$ echo $?
0    <== error code must be 0
[fedora@ns fts-xapian]$ 

If error code is 0, then restart Dovecot :

[fedora@ns fts-xapian]$ sudo systemctl restart dovecot

Do a check on the system journal file for any errors, and correct accordingly.

 

Index Mailboxes

Depending on the number of mailboxes (and size of each) on your Dovecot server, this can be the most time-consuming part of the process. So if you are a coffee drinker, now would probably be a good time to grab one. 

To initiate the indexing on all your mailboxes, type :

[fedora@ns fts-xapian]$ sudo doveadm index -A \* 

Make sure you have enough available space before commencing.

While the indexing is processing, type 'top' and press enter to monitor the memory used on the 'indexer-worker' process. If you notice that the 'indexer-worker' process is exceeding the allocated memory and causing 'coredumps', then stop the Dovecot service, increase the memory in the 'indexer-worker' service ( in conf.d/10-master.conf ), restart Dovecot and attempt the indexing again. The indexing process will be complete when the indexer-worker service stops. 

 

Regular Optimisation

Depending on what your preferred schedule is, you can perform a regular optimisation on the Full-Text indexes via crontab. An example is as follows :

# crontab -e
...
30   4   *   *   *   /usr/bin/doveadm fts optimize -A

 

Testing

Once the Full-Text indexes have been created, you can test it out by doing a context search via your favourite email client. I have noticed a marked improvement in search speed using Roundcube (a webmail client), as well as BlueMail (an email app client on Android). Whatever flavour email client you prefer, your search results should be markedly quicker than they were.

 

In Conclusion

Setting up a Full-Text index is an often overlooked feature when setting up a Dovecot server. If you have end-users with large mailboxes and they've been complaining of slow search speeds, and you have not implemented Full-Text indexing, then it is worth investing your time in implementing it onto your Dovecot server. Your end-users will thank you for it. 

 

About the author

Tom Thorp
Tom Thorp is an IT Consultant living in Miami on Queensland's Gold Coast. With over 30+ years working in the IT industry, Tom's experience is a broad canvas. The IT services Tom provides to his clients, includes :
 
Website development and hosting
Database Administration
Server Administration (Windows, Linux, Apple)
PABX Hosting and Administration
Helpdesk Support (end-user & technical).
  If you like any of my content, consider a donation via Crypto by clicking on one of the payment methods :
 
Categories