With the new version 1.4, we’ve refactored some of the main parts of mosparo, especially the management of the rules and rule packages. Read more about the new features and changes in this post.

New features

Refactoring rules

In early January, StrangerGithuber asked us if and how it is possible to manage a high number of rule items in mosparo. StrangerGithuber developed a script to import the rule items.

However, we observed that mosparo struggles to manage large numbers of rule items. To solve this, we thought about a better way to manage the rules and rule items in mosparo.

Our idea was to replace the simple form with an editable table, allowing users to manage rule items directly. The changes will be stored automatically in the backend. To achieve this, we added an extra step to the rule creation process, allowing users to enter the rule’s name. After that, the rule is created and can be managed as before.

To make adding the items easier, we’ve added a functionality to import rule items from a text or CSV file directly from the user’s computer. We’ve also added an option to filter the list of rule items. Additionally, we’ve added an option to delete multiple items at once. 

Suggested by StrangerGithuber

Refactoring rule packages

While discussing the issue with the high number of rule items, we received a suggestion from Digi92 to allow rule packages from a local file, rather than just the URL as before.

We discussed this idea and concluded that we could add four different options for importing a rule package in mosparo. Technically, we split the four options into two groups: automatic and manual imported rule packages.

Automatic rule packages are available from a URL or a file path and are automatically imported by mosparo from time to time.

The manual rule packages are packages that the user has to import. The idea behind this is that the user may exactly know when the rule package changes and therefore can initialize the import manually. To import the rule package manually, mosparo v1.4 offers a new CLI command with which the rule package can be imported via the CLI.

The fourth option to import a rule package is the newly added API endpoint. With this API endpoint, it is possible to import the rule package automatically from a different tool.

When discussing the options, we also noticed that it would be good to offer a tool to create the rule package. We have started a new project, called mosparo RPG (Rule Package Generator), which is still in development but will offer the required tool to convert a text or CSV file (and more) into the structure necessary to import it as a rule package.

The benefit of using the mosparo RPG over direct import of text or CSV files into a rule is that the RPG can be executed automatically and can utilize multiple sources to retrieve rule items. mosparo RPG is not yet ready for production. We’ll publish more information as soon as the first version is available.

Suggested by Digi92

Cache rules and rule packages

Handling large numbers of rule items can be very challenging. When refactoring the rules and rule packages, we’ve detected that our validation logic is very slow. First, we thought the most straightforward approach was to add a cache. But with thousands and millions of rule items, loading all these items is not a fast approach.

To solve the issue, we decided to store the IDs of the found rule items (see below “Renewed validation logic”) for a specific value in the cache. When the exact value is rechecked, mosparo loads the ID from the cache and knows the required rule items without any other SQL query.

Between the submissions of different users, many form values change, which means they won’t hit the cache. But other values will hit the cache. For instance, equal user agents, AS numbers, country codes, and possibly IP addresses can hit the cache since many users visit a website from the same country or provider. Also, if the user submits the form twice, the second request will use the cache for most values, as long as the user has not changed all the fields. 

Disable spam detection via security policy

In some cases, it is required to turn off the spam detection for a subnet, a country, or (maybe not that often) an AS number. For example, you may want to prevent mosparo from blocking submissions from your IP address, but allow it to block submissions from all other users.

To achieve this, we’ve added an option to turn off the spam detection in the security policy. With this new feature, you can create one security policy for your IP address and disable the detection there, while all the other users use the general security settings.

If the detection is disabled for a user, mosparo will still validate the form values and will store the submission in the database, but will not block any submissions. You can find these submissions as all the other submissions, but in the detection column, mosparo will show that the spam detection was disabled for this submission.

Suggested by thelfensdrfer

Cleanup execution and statistics

mosparo automatically removes old data from the database after a specified time range. This logic is controlled by the next cleanup execution time, which is stored in the local cache of the mosparo installation. If mosparo is installed in a multi-node setup, it is required to use a shared cache so that all nodes will use the same cleanup execution time.

In case the shared cache is not set up, mosparo executes the cleanup process too often. Additionally, after clearing the shared cache, mosparo forgets when to clean the database the next time.

To fix this issue and provide a clearer overview when mosparo cleared the database, we’ve added the cleanup statistic. After every execution of the database cleanup process, mosparo stores the important numbers about the cleanup in the database.

mosparo then uses this database to determine the next execution time. Even if the shared cache is cleared, mosparo can always choose the next execution time because of the stored cleanup statistics. The shared cache is no longer required on a multi-node setup to control the exact execution of the database cleanup.

Inspired by Digi92

Verification issues

With the new version, we’ve added a feature that stores the verification issues in the submission in mosparo. If the verification fails, you can later better understand why the verification failed. Until now, you had to enable the API debug mode and store the result from the API somewhere to understand the issue with a submission.

With this change and the change to highlight the invisible characters, we think that it’s much easier to integrate mosparo in a new system.

Inspired by thelfensdrfer

Highlight invisible characters

When dealing with verification issues, a common problem is that spaces may be accidentally removed from the beginning or end of a field’s value. If the backend of a website trims the value and removes those spaces, the verification will fail because the values are not the same.

To make it easier to debug such a problem, we’ve added a functionality that highlights all invisible characters in a value (except the spaces) and highlights all the spaces at the beginning and the end of a value.

With this modification, it’s easier to see that the verification may fail because of a removed or wrongly decoded hidden character.

Suggested by thelfensdrfer

Design: Adjust the checkbox radius and border width

mosparo allows you to customize many visual settings of the mosparo box. But one of the most important things was only customizable via CSS: the checkbox. When integrating mosparo into some of our pages, we noticed that the checkbox’s border is very bold. We used CSS to adjust the border manually, but we didn’t change anything in mosparo.

The user camlafit asked us why the checkbox is round and not a square (like other CAPTCHA solutions). While we have our reasons (in short, the checkbox is not uncheckable, so it’s in theory more a radio button, which is a circle by default), we thought again about adding the option to change the design of the checkbox in the design settings too.

So, with this new version, mosparo allows the customization of the checkbox. It’s now possible to set the circle radius and change it from a circle to a square. Additionally, the border width of the checkbox can also be adjusted.

Suggested by camlafit

Changes

Renewed validation logic

After optimizing the rules, we understood that our validation logic is not capable of handling a high number of rule items. Because of this, we had to improve the validation logic too.

The old validation logic loaded all rules and then iterated over all rule items to detect spam. This process is very slow and requires a lot of resources.

To optimize the process, we adjusted the whole logic. For every rule item, mosparo stores a prepared and a hashed version of it. Simple rule items like IP addresses, country codes, or AS numbers are stored as a hash. Other things, like word patterns or subnets, are stored as a SQL pattern.

When mosparo validates a form field, it builds one SQL query to find all the possible rule items that need to be verified, depending on the field type. After executing that one query, mosparo iterates over all found items and calculates the spam rating. Using hashed values to find similar values is incredibly fast and very accurate. Other items, such as regular expression patterns, are executed via PHP, so they are not faster than before.

In combination with the added cache for rules and rule packages, the new validation logic is very powerful.

Thank you

Many of the new features were suggested or inspired by some of our users. We’re very thankful for all your ideas, suggestions, and problems that you reported to us, so we can investigate, implement, or fix them.

If you have any suggestions to enhance or optimize mosparo, you can send them to us by creating a post in the Discussions section on GitHub.