I've been self-managing my own email infrastructure for well over three years. During this time, I have noticed the mailbox size has grown substantially, to the point where I now have difficulty retrieving emails from many years past through my email clients. The solution is to implement Xapian's Full-Text Search Index on each of the mailboxes.
Why Xapian?
My decision to choose Xapian as my choice for Full-Text Search Index, is for a number of reasons. With Dovecot v2.3, Dovecot made the decision to deprecate Lucene and Squat FTS engines in favour of Xapian's open source library. Dovecot (at time of writing) maintains support of fts-flatcurve. However, fts-flatcurve is only available with Dovecot's 'FTS stemming' feature, a feature that's only available in the Dovecot Pro FTS Engine (part of OX Dovecot Pro). So rather than adding an additional layer to my server ( ie. using Apache Solr and JRE ), I went with the community developed Xapian plugin ( fts-xapian ). Community plugins fts-elastic and fts-elasticsearch relied on deprecated code from Dovecot's fts-squat.
Installing the Xapian Plugin
To install the Xapian plugin on Fedora, there are a number of prerequisites. In your terminal type:
[fedora@ns ~]$ sudo dnf install sqlite-devel libicu-devel xapian-core-devel
[fedora@ns ~]$ sudo dnf install dovecot-devel git
If you intend on indexing the contents of your email attachments, then there are additional dependencies that have to be installed :
[fedora@ns ~]$ sudo dnf install libxml2 unzip poppler-utils catdoc
Clone the project :
[fedora@ns ~]$ git clone https://github.com/grosjo/fts-xapian
[fedora@ns ~]$ cd fts-xapian
Compile and install the project :
[fedora@ns fts-xapian]$ autoupdate
[fedora@ns fts-xapian]$ autoreconf -vi
[fedora@ns fts-xapian]$ ./configure --with-dovecot=/path/to/dovecot
If you get a "syntax error near unexpected token 'PKG_CHECK_MODULES' " error, run autoreconf pointing to the directory where the package module file "pkg.m4" is located. Thus:
[fedora@ns fts-xapian]$ autoreconf -vi -I /usr/local/aclocal
[fedora@ns fts-xapian]$ ./configure --with-dovecot=/path/to/dovecot
Replace /path/to/dovecot by the actual path to 'dovecot-config'. On Fedora, it is /usr/lib64/dovecot.
Make and install Xapian plugin
[fedora@ns fts-xapian]$ make
[fedora@ns fts-xapian]$ sudo make install
For specific configuration, you may have to 'export PKG_CONFIG_PATH=...'. To check that, type 'pkg-config --cflags-only-I icu-uc icu-io icu-i18n', it shall return no error.
The module will be placed into the module directory of your dovecot configuration.
Configure Dovecot
Update your Dovecot configuration with something similar to :
conf.d/10-mail.conf
.
.
mail_plugins = fts fts_xapian
conf.d/10-master.conf
.
.
service indexer-worker {
vsz_limit = 5GB # Max memory allocated for indexing mailboxes
}
# NOTE : If you require indexing of attachments, uncomment this service.
# service decode2text {
# executable = script /usr/libexec/dovecot/decode2text.sh
# user = dovecot
# unix_listener decode2text {
# mode = 0666
# }
# }
conf.d/90-plugins.conf
plugin {
fts = xapian
fts_xapian = partial=3 full=20 verbose=0
fts_autoindex = yes
fts_enforced = yes
fts_autoindex_exclude = \Trash
fts_autoindex_exclude2 = \Junk
# Un-comment if indexing attachments
# fts_decoder = decode2text
}
Once completed, verify that there are no problems with the configuration by running the following commands :
[fedora@ns fts-xapian]$ sudo doveconf 1>/dev/null
[fedora@ns fts-xapian]$ echo $?
0 <== error code must be 0
[fedora@ns fts-xapian]$
If error code is 0, then restart Dovecot :
[fedora@ns fts-xapian]$ sudo systemctl restart dovecot
Do a check on the system journal file for any errors, and correct accordingly.
Index Mailboxes
Depending on the number of mailboxes (and size of each) on your Dovecot server, this can be the most time-consuming part of the process. So if you are a coffee drinker, now would probably be a good time to grab one.
To initiate the indexing on all your mailboxes, type :
[fedora@ns fts-xapian]$ sudo doveadm index -A \*
Make sure you have enough available space before commencing.
While the indexing is processing, type 'top' and press enter to monitor the memory used on the 'indexer-worker' process. If you notice that the 'indexer-worker' process is exceeding the allocated memory and causing 'coredumps', then stop the Dovecot service, increase the memory in the 'indexer-worker' service ( in conf.d/10-master.conf ), restart Dovecot and attempt the indexing again. The indexing process will be complete when the indexer-worker service stops.
Regular Optimisation
Depending on what your preferred schedule is, you can perform a regular optimisation on the Full-Text indexes via crontab. An example is as follows :
# crontab -e
...
30 4 * * * /usr/bin/doveadm fts optimize -A
Testing
Once the Full-Text indexes have been created, you can test it out by doing a context search via your favourite email client. I have noticed a marked improvement in search speed using Roundcube (a webmail client), as well as BlueMail (an email app client on Android). Whatever flavour email client you prefer, your search results should be markedly quicker than they were.
In Conclusion
Setting up a Full-Text index is an often overlooked feature when setting up a Dovecot server. If you have end-users with large mailboxes and they've been complaining of slow search speeds, and you have not implemented Full-Text indexing, then it is worth investing your time in implementing it onto your Dovecot server. Your end-users will thank you for it.
About the author |
|
Tom Thorp is an IT Consultant living in Miami on Queensland's Gold Coast. With over 30+ years working in the IT industry, Tom's experience is a broad canvas. The IT services Tom provides to his clients, includes :
Website development and hosting
Database Administration Server Administration (Windows, Linux, Apple) PABX Hosting and Administration Helpdesk Support (end-user & technical). |
|
If you like any of my content, consider a donation via Crypto by clicking on one of the payment methods : |