foto gebouw

Why is the posting of unwanted material by bots in forms on Web sites a problem?

At first glance, posting unwanted material on your website's forms may seem like a harmless problem. Yet the impact of this phenomenon should not be underestimated.

You can imagine the owner of a website simply ignoring or even deleting contact requests that clearly do not come from genuine, sincere people. In such a simple scenario, this is indeed correct. However, the problem arises when dealing more interactively with the data uploaded by website visitors and exactly what is allowed to be uploaded.

Contemporary modern websites often have several ways in which users can upload data to a website's database. No longer limited to just one contact form, it can range from public blogs to medical appointment forms, government forms, job applications and feedback and complaint forms.

Why does this happen?

Laat ons beginnen met een kijkje te nemen naar enkele van de hoofdredenen waarom bepaalde mensen of instanties deze technologieën ontwikkelen.

1. Reclame & promotie
De belangrijkste reden om massaal content te uploaden op verschillende websites is voor promotie van producten, diensten en ongewenste reclame. This irrelevant content hopes to attract attention from website owners, moderators and users. This happens on a large scale, sometimes even by criminal organizations that use their botnet as a paid service to promote larger companies or individuals.

2. Poging tot phishing
Phishing is het nabootsen van websites of services om kwaadwillige activiteiten uit te voeren met verzamelde gegevens. These techniques are used in various ways, such as uploading abusive forms or scripts that appear on the website as fake login pages, redirects to phishing websites, or malicious links in advertisements. Also, uploaded scripts can record keystrokes and decrypt passwords.

foto gebouw
foto gebouw

3. SEO Manipulatie
Een reden om bots massaal op websiteformulieren los te laten, is om SEO te manipuleren. Keyword-stuffing is the mass posting of links to other websites, articles or blogs. The content behind these links is often low quality, irrelevant or contains inappropriate material. The same goes for posting irrelevant keywords or incorrectly structured HTML content. This aims to lower SEO scores and reduce traffic to competing websites.

4. Denial-of-service (DOS) attacks
Een Denial of Service (DoS) aanval is het overbelasten van een webserver door overmatige formulierindiening op een website. Although hackers often do this to show off their programming skills, it can have serious consequences. For example, during special promotions or promotions, online shops may become the target of a DoS attack by competitors looking to lure away customers. However, no one wants to buy gifts on a slow or unusable website. Other reasons why someone may be the target of a DoS attack include ideological motives by activist groups or financial gain by extortionists.

At first glance, the impact of these automated assignments may seem limited. However, when you consider that these bots operate on a large scale, it becomes clear that the overall impact can be significant.

What can you do about it?

As you may see, it is not unimportant to consider the potential risks you face when you fail to protect your forms. Fortunately, several techniques now exist to combat this problem. Let's take a closer look at these techniques.

1. Form validation en sanitization
Bij elk ingediend formulier, kan gevalideerd wroden of dit formulier ongeldige tekens of te lange of te korte teksten bevat. Because of this validation, we know that this form may not have been filled in naturally. The second step then consists of detecting and removing any malicious content such as SQL, HTML or javascript code. This way you can counter attacks such as SQL injection or cross-site scripting (XSS) attacks.

2. Honeypot field

A second simple technique to counter bots is by adding a honeypot field. A form usually consists of structural HTML elements, with CSS applied to them. This honeypot field is a field in the form that is made invisible through CSS. So it is present but invisible to ordinary users. You therefore expect that people with no wrong intentions are not going to fill in this field. You can then start using this information to evaluate whether a real user filled out the form. Bots, just like bees go off on honey, are going to fill in all the structural HTML fields found in the document, regardless of the CSS applied. If a form is submitted with that hidden field filled in, you may choose to start ignoring this message.

3. Time-based submissions
Een derde eenvoudige techniek om toe te passen zijn time-based submissions. You may have noticed that many applications do not allow you to make login attempts infinitely many times. This technique leans heavily towards this and consists of two parts. Through session identification between your browser and the Web server, it is possible to identify whether the same user is filling out forms unnaturally in rapid succession. If this happens we can consider this user a bot and blacklist them. Second, we can also keep a time stamp showing the time the page with the form was requested. When the form is then posted, we can see how long it took the user to fill out this form, and if this was done too quickly, we can also mark this user as a bot.

4. Captcha

Perhaps the best known and also most powerful form of protection against bots is captcha technology. Captcha stands for "completely automated public Turing test to tell computers and humans apart". You can recognize the first version of captcha by the heavily distorted frame containing a random word consisting of upper and lower case letters. Only if you can write out the full word correctly will you be accepted as a legitimate user. This word is actually an image, so there is no text to script to evaluate.

When you solve this puzzle, the potential solution will be validated server-side, after which actions can be taken, such as sending the form, for example. The puzzle itself can be generated either server-side or client-side. Although the captcha principle has already been greatly expanded into reCaptcha v1, v2 and v3, captcha itself is not often used anymore. This is because it is quite easy to bypass these days and it only takes into account a valid textual solution to the puzzle.

ReCaptcha is the successor to captcha and also developed by Google. ReCaptcha can be recognized by the more complex puzzles, such as selecting boxes with traffic lights. Generally puzzles are more complex and require more actions, but now the user's behavior while solving the puzzle is also taken into account. Mouse behavior, click behavior, speed and browser properties are evaluated using an artificially trained model to distinguish between humans and robots. Finally, reCaptcha v2, where puzzles were replaced with a checkbox. This was done because many people find puzzles tedious, and the algorithms that analyze user behavior on the website are powerful enough to distinguish robots from humans. Finally, we now have captcha v3, which no longer displays anything at all in the browser and tracks all user behavior in the background to evaluate it when sending the form.

. User behavior analysis, machine learning en AI
Hoewel dit principe sterk aansluit bij captcha, is dat niet de enige vorm van artificiele intelligentie die kan worden ingezet. AI is a hot topic these days and can be used for a variety of reasons including form validation. There are already powerful and extensively trained models capable of distinguishing whether a user is filling out a form or if it is being done by a robot. Various information is then captured while filling out the form such as mouse movement, order of filling in fields, mouse speed, click speed and much more. Using many examples of forms filled out by bots as well as by humans, it can be distinguished if this was filled out by a human or a robot. When you think this technology can be offered as a service or API then it gets really interesting. Unfortunately, this also comes with the major drawback that such technologies are already somewhat advanced, often cost money, and today's bots may also be equipped with such technology and will mimic humans ever better.

Closing words

To combat bots on forms, there are several techniques that vary in their degree of effectiveness. Some techniques are easy to circumvent, while others are complex and nearly impenetrable. At first glance, securing a form may not seem necessary because at first glance bots do not appear to do any damage if they are allowed through. Still, it is important to protect forms, even if they seem to have a trivial function. Bots today are a common and annoying phenomenon that occur en masse. It is important to realize that bots can do significant damage, especially when it comes to forms that require discretion. Therefore, it is advisable to implement different techniques in combination, depending on the specific requirements of your situation.

Share this article via