Two Carnegie Mellon and McGill computer scientists warn researchers that social media data in behavior studies is particularly bias and cannot offer accurate results.
Juergen Pfeffer and Derek Ruth explain that Facebook and Twitter are treated as a goldmine where people’s thoughts are concerned. Scientists believe that the heaps of data that these social media platforms gather can correctly portray what users are thinking. What scientists overlook, however, is to correct inherent biases that datasets contain.
Researchers landed in a criticism firestorm after using Facebook to modify users’ news feeds so as to alter their emotions. The study was so poorly handled that scientists caused a PR nightmare where “secret experiments” also caught the attention of European watchdogs. An investigation was launched by the Electronic Privacy Information Center after widespread outrage over the fact that Facebook allowed scientists to conduct experiments without asking for explicit permission beforehand.
Pfeffer and Ruth wrote in Thursday’s issue of Science that one of the many issues with social media data used in such behavior study is that researchers fail to take moral and privacy issues into account. Normally, they say, universities should be the ones deciding how far researchers should be allowed to go after convening ethical review boards.
Even more troublesome than privacy issues, Ruth and Pfeffer say, is that data compiled after using social media sites lacks context. Five years ago, the computer scientists say, such research was practically nonexistent. Now, a Google Scholar search for the keyword “twitter” produces more than 4.9 results, which is significantly more than virtually every other existent keyword.
But why is data collected through Facebook leading to such bad science? It’s because there’s a special social network for each type of group.
“Instagram, for instance, has special appeal to adults between the ages of 18 and 29, African-Americans, Latinos, women and urban dwellers, while Pinterest is dominated by women between the ages of 25 and 34 with average household incomes of $100,000,”
the two say.
Moreover, social media sites are able to change proprietary algorithms constantly so the filters used on the data collected by scientists are not always clear. In fact, the data is practically not usable.
Real social science, Pfeffer says, should be conducted with all these issues in mind. Scientists are already correctly scouting for biases in such data when using epidemiology, statistics and machine learning techniques. However, for those still insisting on using social media data in behavior studies new analysis methods should come to focus.