{"id":111433,"date":"2021-02-25T15:17:00","date_gmt":"2021-02-25T20:17:00","guid":{"rendered":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/decoded\/\/\/contending-with-the-80-20-rule-when-studying-online-behavior\/"},"modified":"2024-04-17T05:13:41","modified_gmt":"2024-04-17T09:13:41","slug":"contending-with-the-80-20-rule-when-studying-online-behavior","status":"publish","type":"decoded","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/decoded\/2021\/02\/25\/contending-with-the-80-20-rule-when-studying-online-behavior\/","title":{"rendered":"Contending with the 80\/20 rule when studying online behavior"},"content":{"rendered":"\n<figure class=\"wp-block-image size-640-wide\"><a rel=\"attachment wp-att-139485\" href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/decoded\/2021\/02\/contending-with-the-80-20-rule-when-studying-online-behavior\/2021-02-25_decoded_featured-png\/\"><img data-dominant-color=\"c9d3d8\" data-has-transparency=\"false\" style=\"--dominant-color: #c9d3d8;\" loading=\"lazy\" decoding=\"async\"  srcset=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?resize=480,270 480w, https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?resize=782,440 782w, https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?resize=960,540 960w, https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?resize=1200,675 1200w, https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?resize=1400,788 1400w\" sizes=\"(max-width: 480px) 480px, (max-width: 782px) 782px, 640px\" height=\"360\" width=\"640\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?w=640\" alt=\"\" class=\"wp-image-139485 not-transparent\" \/><\/a><figcaption class=\"wp-element-caption\">Pew Research Center illustration<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c072\">Much of the work by the&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/methods\/about-data-labs\/\" rel=\"noreferrer noopener\" target=\"_blank\">Data Labs<\/a>&nbsp;team at Pew Research Center examines online behavior, especially on social media platforms. In recent years, we\u2019ve matched (with their permission) more than 1,000 members of our&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/methods\/2019\/02\/27\/growing-and-improving-pew-research-centers-american-trends-panel\/\" rel=\"noreferrer noopener\" target=\"_blank\">American Trends Panel<\/a>&nbsp;to their Twitter handles, examined the Facebook and Twitter presences of every member of Congress and built a database of every YouTube channel with at least 250,000 subscribers. These projects and others are aimed at helping us understand how people use social media and gain insights into the content on these platforms.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"8c84\">One issue we often have to contend with in these analyses is the Pareto principle, which is also called the \u201c80\/20 rule.\u201d The Pareto principle holds that in many systems, a minority of cases produce the majority of outcomes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5ec9\">This relationship can manifest in many different arenas. For example, we\u2019ve found that the most active 10% of Twitter users&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/internet\/2019\/04\/24\/sizing-up-twitter-users\/\" rel=\"noreferrer noopener\" target=\"_blank\">produce 80% of all tweets<\/a>&nbsp;from U.S. adults; that the 10% of U.S. congressional lawmakers with the most Facebook and Twitter followers&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/internet\/2020\/07\/16\/congress-soars-to-new-heights-on-social-media\/\" rel=\"noreferrer noopener\" target=\"_blank\">receive roughly 80% of all audience engagement<\/a>; and that 10% of the most popular YouTube channels&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/internet\/2019\/07\/25\/a-week-in-the-life-of-popular-youtube-channels\/\" rel=\"noreferrer noopener\" target=\"_blank\">produce 70% of all the videos<\/a>&nbsp;from that group.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"1b06\">None of this is inherently problematic. But depending on our method of data collection, it does place limits on the behaviors we\u2019re able to study. Here are a few questions related to the 80\/20 rule that we often work through when studying social media.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"2faf\"><strong>Do we use the mean or the median when examining a behavior?<\/strong>&nbsp;Because many online outcomes (such as posting volume or engagement) are concentrated among a small subset of accounts, the&nbsp;<em>mean value<\/em>&nbsp;of a given behavior is often larger than the&nbsp;<em>median value<\/em>&nbsp;for that same behavior.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"e23b\">Here\u2019s a real-world example of how this might play out. Let\u2019s say we\u2019re trying to figure out whether Republican or Democratic lawmakers have more followers across their accounts on Facebook and Twitter. If we simply run a numerical average for those two groups among members of the 116th Congress, it would appear that Democrats on average have nearly 480,000 followers across these two platforms, while Republicans have just over 260,000 \u2014 a difference of more than 218,000 followers per member.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"cd65\">But if we take a closer look at these two groups, we see that only five members of the House or Senate in the previous Congress had&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/short-reads\/2021\/01\/25\/though-not-especially-productive-in-passing-bills-the-116th-congress-set-new-marks-for-social-media-use\/\" rel=\"noreferrer noopener\" target=\"_blank\">more than 10 million followers<\/a>&nbsp;across their various social accounts. And four of those five outliers (Bernie Sanders, Kamala Harris, Alexandria Ocasio-Cortez and Elizabeth Warren) are Democrats or caucus with the party. Sure enough, if we use the&nbsp;<em>median<\/em>&nbsp;instead of the mean, we can see that the typical Democratic legislator has just over 61,000 followers while the typical Republican has just over 50,000. That\u2019s still a difference in followers between the two parties but a much smaller one than what we saw when we ran this as a simple numeric average.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"913b\">This example indicates that outside of a small number of hugely popular accounts, the typical Republican lawmaker has a following that\u2019s much closer to that of the typical Democrat than we might have gathered simply from looking at the means. And because we are often trying to measure the experiences of typical users, in our reports we usually report on medians rather than means.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d517\"><strong>When looking at the prevalence of something on social media, do we measure it by counting&nbsp;<em>posts<\/em>&nbsp;or counting&nbsp;<em>people<\/em>?<\/strong>&nbsp;In our reports, we often present our findings by focusing on people \u2014 for example, the share of U.S. adults on Twitter who have posted about a given topic. That\u2019s partly because the Center has its roots in traditional public opinion research, and we are especially interested in measuring what&nbsp;<em>people<\/em>&nbsp;say and do. But there\u2019s also a practical reason that has to do with the 80\/20 rule.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3ce2\">Here\u2019s how this might play out. Let\u2019s say we have a collection of 1,000 Twitter accounts belonging to U.S. adults, and 999 of them have tweeted exactly once in the time period we\u2019re interested in. But one person is extremely active \u2014 and maybe even a bit obsessed with one topic. That person has posted 1,000 tweets all on their own during that time, and every one of them is about the Philadelphia Eagles football team. (This scenario is fictional, although Eagles fans are indeed known for their passionate and outspoken views.)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"618d\">If we just counted up all the tweets in the sample and examined what they were about, we\u2019d conclude that America is obsessed with the Philadelphia Eagles. After all, half of all tweets in our sample are about that one subject! But in reality, only 0.1% of&nbsp;<em>the people<\/em>&nbsp;in our sample have mentioned the Eagles at all. By presenting our findings based on users rather than posts, we can more accurately depict the experiences of any given individual in the population and balance out the influence of extremely active accounts.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"aff4\"><strong>Do we have an adequate sample size to identify specific behaviors or go deep into particular groups?<\/strong>&nbsp;This issue is especially relevant to our work with U.S. adults on Twitter, where we have recruited a representative sample of users to stand in for a broader population. Because a small share of users produce most tweets, the bulk of the people we sampled tweet extremely rarely, if ever. In fact, the median American on Twitter posts just&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/politics\/2020\/10\/15\/differences-in-how-democrats-and-republicans-behave-on-twitter\/\" rel=\"noreferrer noopener\" target=\"_blank\">one tweet per month<\/a>. That means we simply don\u2019t have a lot of usable data about these individuals\u2019 posting behaviors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"277c\">Here\u2019s how this can limit our ability to analyze particular topics. Let\u2019s say we want to do a deep-dive analysis of the demographics and attitudes of people who posted the #BlackLivesMatter hashtag on Twitter. This is one of the most popular hashtags in history and was&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/short-reads\/2020\/06\/10\/blacklivesmatter-surges-on-twitter-after-george-floyds-death\/\" rel=\"noreferrer noopener\" target=\"_blank\">used nearly 50 million times<\/a>&nbsp;on all of Twitter in just two weeks in 2020, shortly after the police killing of George Floyd in Minneapolis.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d2ba\">But although we can easily identify public tweets containing that hashtag using the Twitter API, we know very little about the opinions, demographics or other personal characteristics of the people sharing them. That is something our survey panel of U.S. adults with matched Twitter handles could help us understand.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3b2f\">Using that sample, we have estimated that&nbsp;<a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/politics\/2020\/10\/15\/differences-in-how-democrats-and-republicans-behave-on-twitter\/pdl_10-15-20_twitter-party-00-6\/\" rel=\"noreferrer noopener\" target=\"_blank\">3% of U.S. adults on Twitter<\/a>&nbsp;used the #BlackLivesMatter hashtag between November 2019 and September 2020. But unfortunately, that 3% figure works out to just 148 actual respondents from our survey panel. And once we account for the&nbsp;<a href=\"https:\/\/medium.com\/pew-research-center-decoded\/weighting-survey-data-with-the-pewmethods-r-package-d040afb0d2c2\">design effect<\/a>&nbsp;of our survey, that\u2019s an effective sample size of just 77 adults, which is simply not large enough to conduct a robust analysis of that group as a stand-alone entity. And notably, this is for a relatively common behavior. Many online actions \u2014 whether tweeting about a particular topic, following a particular account or posting a link to a particular article \u2014 are going to be too rare for us to measure at all with any degree of specificity.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"7669\">The issues mentioned earlier in this post can be addressed by shifting our analytic frame, but there isn\u2019t much of a fix for this one, short of recruiting many more members to our survey panel, which comes with significant logistical and financial challenges. This is simply an inherent limitation of social media collections with \u201conly\u201d a few thousand respondents or accounts.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Pareto principle, or \u201c80\/20 rule,\u201d holds that in many systems, a minority of cases produce the majority of outcomes.<\/p>\n","protected":false},"author":377,"featured_media":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"sub_headline":"","sub_title":"","_prc_public_revisions":[],"_ppp_expiration_hours":0,"_ppp_enabled":false,"ai_generated_summary":"","relatedPosts":[],"datacite_doi":"","datacite_doi_citation":"","_prc_seo_qr_attachment_id":0,"spoken_article_player_enabled":true,"displayBylines":true,"footnotes":"","prc_watchers":[],"_prc_fork_parent":0,"_prc_fork_status":"","_prc_active_fork":0},"categories":[353],"bylines":[973],"collection":[],"_post_visibility":[],"decoded-category":[530],"formats":[454],"_fund_pool":[],"languages":[],"regions-countries":[],"research-teams":[524],"workflow-status":[],"class_list":["post-111433","decoded","type-decoded","status-publish","hentry","category-data-science","bylines-aaron-smith","decoded-category-data-science","formats-decoded","research-teams-decoded"],"label":"Decoded","post_parent":0,"word_count":1255,"canonical_url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/decoded\/2021\/02\/25\/contending-with-the-80-20-rule-when-studying-online-behavior\/","art_direction":{"A1":{"id":139485,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?w=564&h=317&crop=1","width":564,"height":317,"caption":"","chartArt":false},"A2":{"id":139485,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?w=268&h=151&crop=1","width":268,"height":151,"caption":"","chartArt":false},"A3":{"id":139485,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?w=194&h=110&crop=1","width":194,"height":110,"caption":"","chartArt":false},"A4":{"id":139485,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?w=268&h=151&crop=1","width":268,"height":151,"caption":"","chartArt":false},"XL":{"id":139485,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?w=720&h=405&crop=1","width":720,"height":405,"caption":"","chartArt":false},"social":{"id":139485,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2022\/08\/2021.02.25_decoded_featured.png?w=1200&h=628&crop=1","width":1200,"height":628,"caption":"","chartArt":false}},"_embeds":[],"watchers":[],"table_of_contents":[],"datacite_doi":"","prc_seo_data":{"title":"Contending with the 80\/20 rule when studying online behavior","description":"The Pareto principle, or \u201c80\/20 rule,\u201d holds that in many systems, a minority of cases produce the majority of outcomes.","og_title":"Contending with the 80\/20 rule when studying online behavior","og_description":"The Pareto principle, or \u201c80\/20 rule,\u201d holds that in many systems, a minority of cases produce the majority of outcomes.","schema_type":"Article","noindex":false,"canonical_url":"","primary_terms":{"category":1},"custom_schema":[],"og_image":139485,"indexnow_submitted_at":null,"gsc_index_status":null},"prepublish_checks":{},"jetpack_sharing_enabled":true,"relatedPostsOrdered":[],"bylinesOrdered":[{"key":"_if1eyamor","termId":973}],"acknowledgementsOrdered":[],"_links":{"self":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/decoded\/111433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/decoded"}],"about":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/types\/decoded"}],"author":[{"embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/users\/377"}],"replies":[{"embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/comments?post=111433"}],"version-history":[{"count":2,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/decoded\/111433\/revisions"}],"predecessor-version":[{"id":139514,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/decoded\/111433\/revisions\/139514"}],"wp:attachment":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/media?parent=111433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/categories?post=111433"},{"taxonomy":"bylines","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/bylines?post=111433"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/collection?post=111433"},{"taxonomy":"_post_visibility","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/_post_visibility?post=111433"},{"taxonomy":"decoded-category","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/decoded-category?post=111433"},{"taxonomy":"formats","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/formats?post=111433"},{"taxonomy":"_fund_pool","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/_fund_pool?post=111433"},{"taxonomy":"languages","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/languages?post=111433"},{"taxonomy":"regions-countries","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/regions-countries?post=111433"},{"taxonomy":"research-teams","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/research-teams?post=111433"},{"taxonomy":"workflow-status","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/workflow-status?post=111433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}