Bayes expiry module provides intelligent expiration of statistical tokens for the
new schema of Redis statistics storage.
The configuration settings for the
bayes expiry module should be incorporated into the appropriate
classifier section, such as the
local.d/classifier-bayes.conf file. Additionally, as the
Bayes expiry module necessitates the use of the new statistics schema, it is imperative to enable it within the classifier configuration:
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
The following settings are valid:
commoncategory are not affected. For more information, see the expiration modes for detail. Supported values are:
-1: make tokens persistent;
bayes expiryfor the classifier. Note that this does not change the TTLs of existing tokens, but new learned tokens will be persistent.
true- enable lazy expiration mode (disabled by default). See expiration modes for detail.
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0 expire = 8640000; #lazy = true; # Before 2.0
bayes expiry module performs an expiry step every minute. During each step, it examines the frequency of approximately 1000 statistical tokens and adjusts their TTLs if needed. The duration of a full iteration varies based on the number of tokens; for example, a full cycle for 10 million tokens takes approximately one week to complete. Once the
bayes expiry module finishes a full iteration, it starts over again.
Bayes expiry module categorizes tokens into four groups based on their frequency of occurrence in ham and spam classes:
default mode has been removed in Rspamd 2.0 as it offers no benefits compared to the
significanttoken’s lifetime: update token’s TTL every time to
commontoken: reset TTL to a low value (10d) if the token has greater TTL.
expiretime. The Bayes Expiry module must periodically update their TTLs, which means special backup procedures are required. Simply copying the
*.rdbfile will result in its expiration after the
expiretime has passed.
significanttokens, constant updating of their TTLs is not necessary.
lazy mode is the only expiration mode since Rspamd 2.0.
This mode ensures that
significant tokens with a TTL are persistently kept (the module sets significant tokens TTLs to -1, i.e. makes them persistent if they are not), while TTL of
infrequent tokens is reduced to the
expire value if its current TTL exceeds
Common tokens are discriminated by resetting their TTL to a lower value of 10 days, if their TTL exceed this threshold.
The advantages of the “lazy” mode include:
To activate the lazy expiration mode in Rspamd versions prior to 2.0, simply add
lazy = true; to the classifier configuration.
The expiration mode for an existing statistics database can be altered in the configuration at any moment. Token’s TTLs will be updated as required during the subsequent expiration cycle.
When a new
expire value is set to a lower value than the current one, the TTLs exceeding the new
expire value will be updated during the next expiration cycle.
To increase the
expire value, it is necessary first to make the tokens persistent by setting
expire = -1; and waiting until at least one expiration cycle is completed. Only then the new
expire value can be set.
The memory usage of the statistics dataset can be managed using the Redis
maxmemory directive and
volatile-ttl eviction policy. If the memory usage exceeds the set “maxmemory” limit, Redis will evict keys with shorter TTLs in accordance with the policy. Additionally, memory usage can be maintained at a nearly constant level by setting the TTL to an extremely high value, causing keys to be evicted instead of expiring.
To ensure that the memory limit and eviction policy only apply to the Bayesian statistics dataset, it should be stored in a separate Redis instance. A comprehensive explanation on configuring multi-instance Redis can be found in the Redis replication tutorial.
backend = "redis"; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0 servers = "localhost:6378"; new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0 expire = 2144448000; lazy = true; # Before 2.0
expire = 2144448000; sets a very high TTL of 68 years, as there is no need for the actual expiration of keys.
include /usr/local/etc/redis.conf port 6378 pidfile /var/run/redis/bayes.pid logfile /var/log/redis/bayes.log dbfilename bayes.rdb dir /var/db/redis/bayes/ maxmemory 500MB maxmemory-policy volatile-ttl
maxmemory 500MB sets Redis to use the specified amount of memory for the instance’s dataset and
maxmemory-policy volatile-ttl sets Redis to use the eviction policy when the
maxmemory limit is reached.