The Rise of Group Approaches

The difficulties of detecting complex bots using early approaches quickly gave rise to a new line of research. Since 2012–2013, several different teams have independently proposed new systems that, despite being based on different methods and implementations, share the same concepts and philosophy.

As shown schematically in Figure 3 (Part B), the main characteristic of these new systems is their focus on groups of accounts rather than individual accounts . The rationale behind this choice is that bots act in coordination with other bots to form botnets to amplify their impact [40].

The existence of botnets does not necessarily mean that the accounts are clearly

Linked on a social network, but rather that they are manipulated by a single entity and share common goals. Thus, botnets leave behind more traces of automation and whatsapp data  coordination than complex single bots [5].

Developing methods to detect suspiciously coordinated and synchronized behavior will likely yield better results than analyzing individual accounts. Additionally, by analyzing large groups of accounts, detectors gain access to more data to feed powerful, but data-hungry, AI algorithms.

In 2018, about five years after the emergence of the group-based approach to bot detection, Facebook [h] and Twitter [i] also recognized the importance of focusing on coordinated and inauthentic behavior.

The second common feature of most swarm detectors is that they offer important algorithmic contributions, allowing a shift from general-purpose machine learning algorithms such as support vector machines and decision trees to specialized algorithms specifically designed for bot detection. Many swarm detectors are based on unsupervised or semi-supervised approaches. The idea is to overcome the generalization shortcomings of supervised detectors, which are severely limited by the availability of comprehensive and robust training datasets [11].

To quantitatively demonstrate the growth of group-based approaches to bot detection

Figure 5 shows the results of an extensive longitudinal classification. We examined over 230 papers that proposed a bot detection technique and manually classified each detector along marketing life cycle sayings to remember two orthogonal dimensions. The first dimension (Part A) indicates whether the detectors target individual accounts or groups of accounts. Part B then classifies the detectors according to their high-level approach to the task.

In particular, detectors are classified based on the following:


>heuristics, that is, based on simple rules;
crowdsourcing – that is, taking into account the opinions of experts;
supervised machine learning – for example, classification-based and requiring a labeled training dataset;
unsupervised machine learning – for example, based on clustering, which does not require labeled data for training;
or on adversarial approaches, including adversarial machine learning.
To better explain the methodology, here are some examples showing how known bot detectors were classified. The system proposed by Ruan et al. [26] is designed to detect compromised accounts – originally legitimate ones that have been taken over by an attacker.

Initially, it creates a behavioral profile for each account under investigation

The system can then identify compromis accounts using anomaly detection when behavior deviates significantly from the corresponding profile. This system is classifi as a personal european union phone number detector (since the behavioral profile of an account depends solely on its own actions) and as an unsupervis detector (since it uses anomaly detection).

Conversely, another system looks for suspiciously large similarities between the sequence of actions of large groups of accounts [6]. Each account’s activity is encod as a string, and the similarity between account actions is calculat by applying the longest common subsequence metric to such strings. Suspiciously long subsequences between activity strings are identifi using spike detection, and all those accounts that share a common subsequence of actions are label as bots.

Given these characteristics, this work contributes to the creation of group bot detectors (since it analyzes a group of accounts to find similar action sequences) and unsupervis machine learning approaches (since it uses an unsupervis peak detection algorithm).

Summarizing the two previous examples, we note several interesting patterns that emerge from the classification. The vast majority of techniques that perform network analysis, for example given the social network graph or account interactions, are classified as group-based. They also most often offer unsupervised approaches. In contrast, all techniques based on the analysis of the textual content of published messages, such as works that use exclusively natural language processing methods, are supervised detectors that analyze individual accounts.

CACM: A Decade of Social Bot Computing

Figure 5. Longitudinal categorization of 236 bot detectors published since 2010

Figure 5 legend : Data points indicate the number of new detectors of each type published in a given year. Part A classifies detectors as either focused on single-account analysis or on group-of-account analysis. Part B classifies the same detectors based on their high-level approach to the problem. Both parts clearly document the rise of a new approach to bot detection characterized by group analysis and a plethora of unsupervised detectors.

Interestingly, the plateau reached by unsupervised approaches since 2017 has occurred in conjunction with the recent rise in the number of adversarial approaches.

Caption to the left of the graphs: number of new bot detectors.

Lines on the first graph: group, individual [approaches].

Lines on the second graph: adversarial [approach], crowdsourcing, heuristics, supervised, unsupervised.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top