The Dawn of Social Bot Detection | Ukraine Business Directory

5/5 - (1 vote)

The first work specifically devoted to detecting automated social media accounts dates back to January 2010 [37]. In the early days, most attempts at detecting bots had two distinctive features: they were based on supervised machine learning and on the analysis of individual accounts. In other words, given a group of accounts to analyze, detectors were individually applied to each account in the group, to which they assigned a binary label (bot or user).

This approach to bot detection is schematically presented in Part A of Figure 3. Here, the key assumption is that bots and humans are clearly separable and that each malicious account has individual features that make it distinct. This approach to the problem of social bot detection also relies on applying ready-made general-purpose classification algorithms to the accounts under investigation and on developing effective machine learning functions to distinguish bots from legitimate accounts.

CACM: A Decade of Social Bot Computing

Figure 3. Differences between early and group approaches to detecting social bots

Captions in Figure 3 : In early approaches (Part A), a telegram data supervised detector is applied separately to each account under investigation. Unless the bot appears very different from a human-controlled account, as is the case with recently emerged bots, it will likely not be detected.

In more recent approaches (B), the detector analyzes a group of accounts looking for traces of coordinat and synchronized behavior. Large groups of coordinated accounts are more likely to be detect than complex individual bots. However, prediction errors may still occur for small groups of poorly coordinat bots that may provide insufficient information to detect them, or for groups of humans that may appear automat. These issues currently represent unsolved problems in the field.

Icons

The face in the hat is a bot detector.

An empty circle is a person.

The dott square is the target of the bot detector.

R circle – old type bot.

Pink circle – evolv type of bot.

Yellow square – account marked as a bot

For example, Kreshi and others have develop a set of supervis machine-learning classifiers to detect so-call fake followers, a type of automat account typically us to grazitti’s view on the liberation of marcheto artificially boost the popularity of public figures who buy them [4]. Fake followers can be purchas for as little as $12 per thousand on the public web [referring to the difference between the surface and the dark web]. The result: such bots are quite common.

The authors analyzeabout 3,000 fake followers obtain from different providers and found that the simplistic nature of these accounts makes them fairly easy to detect, even using just 19 features that are not data- or computationally intensive [4]. The fake followers do not ne to perform complex tasks such as creating content or participating in conversations.

Other detection systems use a large number of machine learning features to detect social bots. Using over 1,200 account features, Botometer evaluates potential bots bas on their profile characteristics, social network structure, the content they create, sentiment expressions, and the timing of their actions [35]. Rather than focusing on a specific type of bot, as Kreshi did, Botometer is a “general purpose” bot detector. However, the detector’s versatility and ease of deployment are offset by its accuracy in detecting bots [5][17].

The two previous detectors simultaneously analyze several parameters of suspicious accounts to detect possible bots

Other systems instead focus solely on network characteristics, the text content of repost messages, or profile information. These systems are generally easier to use because european union phone number they analyze only one aspect of complex bot behavior.

Despite promising initial results, these early approaches have several shortcomings. The first challenge in developing a supervis detector is the availability of a dataset to use in the classifier training phase. In most cases, there is no reliable information and labels are simply assign by operators manually analyzing the data. Critical issues arise as a consequence of different definitions of social bots, leading to different labeling schemes [18]. Moreover, it has been shown that humans suffer from bias in interpretation and are largely unable to detect complex bots. Only 24% of bots were correctly label as such by humans in an experiment [5].

Moreover, these approaches typically output binary classifications

However, in many cases, malicious accounts exhibit a mix of automat and human-driven behavior that cannot be explain by simple binary labels. To make matters worse, a serious shortcoming of individual detectors is caus by the evolutionary nature of social bots.

The Problem of Bot Evolution
The initial success of detecting social bots forced their developers to implement sophisticated countermeasures. Because of this, new bots often have characteristics that make them much more difficult to detect. This vicious cycle leads to the development of more sophisticated social bots and is commonly referred to as bot evolution.

Work published between 2011 and 2013 by Chao Yang and other researchers provided the first evidence and theoretical foundations for studying the evolution of social bots [34]. The first wave of social bots that populated social networks around 2011 consisted of rather simplistic bots. These were accounts with very low reputation due to a small number of social connections and published messages, and with clear signs of automation, as shown in Part A of Figure 4.

The social bots studied by Chao Yang et al. were more popular and trustworthy given their relatively large number of social connections. These bots no longer spam the same messages over and over again.

CACM: A Decade of Social Bot Computing

Yellow square – account marked as a bot

The two previous detectors simultaneously analyze several parameters of suspicious accounts to detect possible bots

Moreover, these approaches typically output binary classifications

Related Posts

Our Best Database