Obstacles to analysing social networks by internet governance scientists
Written by
Lucas Anjos (See all posts from this author)
2 de September de 2019
One of the main problems facing internet governance researchers today are the mechanisms for analyzing social network data and private messaging platforms. Much of the phenomena that most intrigue and challenge society today occur online, such as the influence on electoral elections (including disinformation), the threat to the exercise of fundamental rights (freedom of expression and access to information, notably), the practice of bullying, among others. For scientists interested in these and in other topics, the task remains to map demands, to categorize social groups and objects of study, to obtain meaningful data, and draw plausible conclusions from concrete results. But how to do this in a reliable, representative and relevant manner? What tools to use in social network analysis and what limitations can we expect from these techniques? This post aims to answer these and other questions.
The size of the problem: representativeness
Who is connected? What these people talk about online represent society’s conversation as a whole? This is one of the first questions anyone willing to analyze social data needs to ask themselves. In the Brazilian case, the TIC Domicílios survey is a reliable, comprehensive and detailed source that describes the profile of internet users in the country. The latest version of it reveals the advancements of 2018, and is now available here.
According to the report, almost 70% of households are connected, but there is still a strong cut in social class, since in classes D and E this percentage drops to 48%. In urban areas, the connection rate is 74%, while in rural areas the number is 49%. Therefore, a detailed investigation of online discourse must, necessarily, make caveats to the profiles (including socioeconomic, but also geographic) of the users subjected to the analysis.
In addition, in the set of profiles analyzed on social networks such as Facebook and Twitter, for example, the use of bots is increasingly common for a mass reproduction of boosted content. Basically, bots consist of accounts not representative of an individual user (as proposed by these networks), but thousands of accounts that are controlled by a small number of individuals. They make a lot of noise online (automation of comments, reactions, shares and posts), but do not reflect the reality of individual users who are online. For this reason, those who undertake the difficult endeavor to analyze social networks must first “clean up” the chatter of the set of study objects from these automated accounts, unless the object of study is precisely the bots.
Another aspect of analysis to consider is the personalization these platforms perform in user experiences. Recent research coordinated by Prof. Virgílio Almeida (UFMG and Harvard), for example, tried to map radicalization traces in the YouTube video recommendations tool. One of the biggest weaknesses of the research’s central thesis, however, is precisely the personalization that the platform makes with the videos that are recommended to its users, as it takes into account geolocation, browsing history, account-linked interest patterns from Google, among others. How to know if the study effectively reflects what the platform has offered its users? Researchers who analyze suggestions in search engines, such as Google and Bing have the same problem.
Privacy and ethics in research
The platforms most commonly used as social networks by online users, such as Facebook, Twitter, Instagram and YouTube, have different privacy settings. Users can increasingly configure their accounts to show higher or lower levels of privacy for their posts, comments, reactions, related friendships, publicly available videos and photos, among other degrees of individualization. This is nonetheless positive from the point of view of exercising the right to privacy of users, but it greatly limits and restricts the scope of analysis possible by researchers interested in mapping profiles, speeches and other metrics generated by these platforms.
Even users who let their profiles, comments and posts in public mode, which would be ideal for those looking to analyze the metrics generated by their online activity, raise valid questions to network researchers. This advertising, usually adopted by default for platform subscribers, stems from the acceptance of terms and conditions of use, which in turn are often criticized by the same network researchers for denouncing the fragility of consent to assign personal data, rights intellectual property rights on the posted content, among other things.
Regarding privacy, it is also important to mention the ethical and privacy limits imposed on researchers who perform ethnographic analysis. Contrary to questionable criteria of advertising and marketing companies, serious academic research seeks ethics in the collection, processing and disclosure of the data obtained. This raises concerns about the consent of each of the investigated subjects (if the information is not in public mode), the anonymization of the subjects (especially for the purpose of disclosing final results), and approval by independent committees to review these standards. As much as these issues slow the speed of approval, achievement and publication of results, they are minimum standards that guarantee autonomy and privacy of subjects.
Technique as an essential tool for analysing social networks
Many of the researchers interested in the behavior of social network users, for natural reasons of instigation of study objects, derive from areas related to the traditional human and applied social sciences, such as Sociology, Political Sciences, Psychology, Law, Anthropology, among others. However, the operation of these online networks requires, at the very least, basic knowledge about the internet’s infrastructure, Over-the-Top (OTT) applications that use the Internet structure to offer products and services, and finally, tools collection and analysis of aggregated data produced by these networks.
This need presents these scientists with the challenge of conducting increasingly multidisciplinary research, not only with regard to the object of study (which in itself already involves the logic of computer science, behavioral economics etc.), but also in the composition of their teams. A working group of data scientists, programmers, new media specialists, and traditional social scientists is more likely to use appropriate behavioral investigation tools, with a larger scale of subjects and quicker timeframe for data analysis. As social sciences researchers, we must be able and willing to improve ourselves by learning data analysis software and ancillary tools (you can see a few examples here), or at least working with increasingly diverse teams in terms of training.
Still, we have to deal with the constant change in the way these platforms work, which, by the way, are still protected by industry secrets that reduce the transparency of their algorithms. Often, research time is not the same as the market’s, or society’s. This means that in the timeframe in which data is collected, analyzed and publicized, significant changes may occur in the way algorithms present posts to particular users, changes in the number of possible reactions to a post, new commenting and sharing possibilities, disabling of tools abandoned by platforms, among other developments common to the universe of digital applications. In addition to researchers having to update themselves much more frequently than the developments they were accustomed to (such as constant legislative changes in the field of law), otherwise their research would risk becoming irrelevant or outdated at the time of publication, this creates the additional problem of analyzing study objects that are in constant (and substantial) transformation.
From social networks to private chats
A trend observed in recent years has been the increasing transition from “open” social network users, such as Facebook and Twitter, to private messaging applications (including group formation). This is one of several reasons why the Facebook group has gradually merged, fusing audiences, information, data and messaging across its three platforms (including WhatsApp and Instagram).
The transition from public platforms to private messengers presents an additional obstacle for online speech researchers. What was once traceable through hashtags and keywords now hides behind semi-public groups (the link is not always available to the general public), random mobile numbers and end-to-end encryption. How to map the origin and path of information online (false news and hate speech, for example)? What is the best way to report what you observe in messaging groups if there is no consent from all members? What would be the structure of the message exchange network (something that is most easily mapped on Twitter, for example) if each user’s contacts are not public?
A paradigmatic case that reflects this problem was the truck drivers’ strike in Brazil, in 2018. The main fora for deliberation, decision, and communication of the strike were on WhatsApp, according to several individual reports. Researchers and journalists who wanted to examine these groups, to understand how the movement was organized and who created them, had to accompany several groups, each with more than 250 members, from different profiles, without the certainty of personally identifying users, nor their origins, for investigative purposes. In the end, how to credibly report the results obtained? The same dilemma arises today, with group analysis and message exchange with polarized political content.
What to expect for the future?
The only certainty that this scenario really presents is that these problems will not be simplified over time. The complexity of the digital landscape increases progressively, requiring growingly frequent updates from researchers. Digital platforms are undergoing increasing public scrutiny, including through more comprehensive regulatory instruments that require more transparency from application providers, such as the Lei Geral de Proteção de Dados in Brazil and the General Data Protection Regulation, in the European Union. What can be expected from (and to) the scientific community is the possibility of more interdisciplinary research, complementarity of studies in other areas, the aid of data analysis automation tools, among others.
Interested in the topic of personal data protection and information security? Want to read more about it? Try reading this post by Diego Machado on our blog!
The views and opinions expressed in this article are those of the authors.
Written by
Lucas Anjos (See all posts from this author)
Founder and Scientific Advisor of the Institute for Research on Internet and Society. Law Professor at Universidade Federal de Juiz de Fora. Has a Master and a Bachelor degree from the Federal University of Minas Gerais, with a scholarship from CAPES (Coordination for the Improvement of Higher Education, a Foundation within the Brazilian Ministry of Education), and is currently a PhD student at the same institution. Specialist on International Law by CEDIN (Center for International Law).
Assistant professor for the International Economic Relations and Law Courses at the Federal University of Minas Gerais. Lawyer and member of the ABRI (Brazilian Association for International Relations)