{"id":7797,"date":"2024-02-06T13:50:00","date_gmt":"2024-02-06T18:50:00","guid":{"rendered":""},"modified":"2025-04-23T23:59:04","modified_gmt":"2025-04-24T03:59:04","slug":"how-we-used-large-language-models-to-identify-guests-on-popular-podcasts","status":"publish","type":"short-read","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/short-reads\/2024\/02\/06\/how-we-used-large-language-models-to-identify-guests-on-popular-podcasts\/","title":{"rendered":"Q&amp;A: How we used large language models to identify guests on popular podcasts"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Our recent <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/journalism\/2024\/02\/06\/most-top-ranked-podcasts-bring-on-guests\/\">report on podcast guests<\/a> involved a fairly daunting research challenge: identifying all the guests who appeared on roughly 24,000 podcast episodes in 2022, based solely on the episode descriptions. We\u2019re always on the lookout for more efficient ways of doing our work, so we decided to see if the newest generation of large language models, or LLMs \u2013 the technology behind popular tools like ChatGPT \u2013 could help us.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><em>Related:<\/em><\/strong><em> <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/journalism\/2024\/02\/06\/most-top-ranked-podcasts-bring-on-guests\/\">Most Top-Ranked Podcasts Bring On Guests<\/a><\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this Q&amp;A, we talk with the researchers who worked on the analysis \u2013 computational social scientists <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/staff\/galen-stocking\/\">Galen Stocking<\/a>, <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/staff\/meltem-odabas\/\">Meltem Odaba\u015f<\/a> and <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/staff\/samuel-bestvater\/\">Sam Bestvater<\/a> \u2013 about how they approached this task, how it worked, and what they learned about using the new generation of LLMs for research purposes.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"how-would-you-typically-approach-a-research-project-like-this\">How would you typically approach a research project like this?<\/h4>\n\n\n\n<figure class=\"wp-block-image alignright size-640-wide is-resized\"><img decoding=\"async\" sizes=\"(max-width: 200px) 100vw, 200px\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/2024\/02\/KSA0249-Edit-jpg.webp?w=640\" alt=\"\" class=\"wp-image-451176\" style=\"width:200px\" \/><figcaption class=\"wp-element-caption\">Galen Stocking, senior computational social scientist<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Stocking:<\/strong> With a large data source like this, our first step is typically to ask whether we can identify what we\u2019re looking for in an automated way. Until recently, on a project like this, that might mean using a script to search the episode descriptions for a list of known names or trying to train a specialized machine learning model to identify patterns in the text that signify names.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These are tried-and-true approaches that we\u2019ve used before, but they would not have worked well in this case. The episode descriptions contain tens of thousands of names \u2013 not just hosts and guests, but also the names of people who are in the news or who are discussed but don\u2019t appear on the show. They are also not spelled or presented in a consistent format.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Instead, we likely would have had to tackle this problem by training human coders to read each of the 24,000 episode descriptions and note which guests are mentioned.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But doing this with humans is a lot harder than it sounds. The job is boring and repetitive, so it\u2019s easy for people to lose focus. And when people lose focus, they can miss important information we want them to be attentive to.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What\u2019s more, there are a lot of episode descriptions to read and code. If we assume it takes two minutes for someone to read each episode description, it would take a team of five workers about a month to go through the entire list if working eight hours a day, five days a week without any breaks.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"how-do-large-language-models-help-you-solve-those-problems\">How do large language models help you solve those problems?<\/h4>\n\n\n\n<figure class=\"wp-block-image alignright size-640-wide is-resized\"><img decoding=\"async\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/2021\/08\/Sam_Bestvater.jpg?w=640\" alt=\"\" class=\"wp-image-422890\" style=\"width:200px\" \/><figcaption class=\"wp-element-caption\">Samuel Bestvater, computational social scientist<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Bestvater:<\/strong> You might know about large language models from using OpenAI\u2019s popular ChatGPT chatbot. These models can generate natural-sounding text and carry on conversations because they are trained on large amounts of data to build an advanced internal representation of human language.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Those same traits also make these models good at processing and understanding text that\u2019s already been written \u2013 like podcast episode descriptions. And we don\u2019t have to chat with them one line at a time. We can write code to interact directly and in an automated way with the underlying models themselves.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this case, we gave the model a \u201cprompt\u201d that described the data we wanted it to examine, the specific information we wanted it to retrieve, and the rules we wanted it to follow in doing so. Then we used an automated script to show it each episode description and ask it to retrieve any guest names. You can find the exact language we used in the <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/journalism\/2024\/02\/06\/podcast-guests-methodology\/\">methodology of our report<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We\u2019ve used similar models and their precursors before. For instance, we used them to help us identify <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/internet\/2023\/06\/29\/ten-years-of-blacklivesmatter-on-twitter\/#Expressions-of-support-and-opposition-to-the-BlackLivesMatter-movement-over-time\">tweets expressing support or opposition to the Black Lives Matter movement<\/a> and compare the language <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/short-reads\/2023\/01\/31\/house-freedom-caucus-on-twitter-going-negative-and-getting-attention\/\">House Freedom Caucus members and other Republicans use<\/a> on Twitter, now known as X. We\u2019ve found that one of their biggest advantages is shortening the timeline for doing boring or rote classification work \u2013 exactly what we needed for this project.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"did-you-test-this-new-tool-what-were-your-concerns-going-into-the-project\">Did you test this new tool? What were your concerns going into the project?<\/h4>\n\n\n\n<figure class=\"wp-block-image alignright size-640-wide is-resized\"><img decoding=\"async\" sizes=\"(max-width: 200px) 100vw, 200px\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/2024\/02\/MeltmOdabas-20221214-jpg.webp?w=640\" alt=\"\" class=\"wp-image-451168\" style=\"width:200px\" \/><figcaption class=\"wp-element-caption\">Meltem Odaba\u015f, computational social scientist<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Odaba\u015f:<\/strong> Anytime we use a new tool, we test it extensively and don\u2019t simply trust that it works as advertised. In this case, we had a few baseline metrics it needed to hit. We wanted to make sure it could perform the basic task of identifying when guests were mentioned in podcast descriptions. And we also wanted to make sure it wasn\u2019t \u201c<a href=\"https:\/\/techcrunch.com\/2023\/09\/04\/are-language-models-doomed-to-always-hallucinate\/?guccounter=1\">hallucinating<\/a>\u201d \u2013 making up guests who weren\u2019t there.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">That process started with the prompt we designed to instruct the model on what we wanted it to do. These instructions restricted the source of its answers so that it only told us what was in the episode descriptions and didn\u2019t draw from any other sources. And they told the model not to guess at the answer if it wasn\u2019t sure if the episode had guests or not.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We also tested the model\u2019s output on more than 2,000 randomly selected episode descriptions that our researchers had also categorized. By comparing those results, we could confirm that it performed about as well as our own trained researchers did \u2013 our threshold for using any sort of assistive technology in our published work. You can also find all those details in the <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/journalism\/2024\/02\/06\/podcast-guests-methodology\/\">report methodology<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"what-happened-when-you-coded-the-full-set-of-podcast-episode-descriptions-how-did-the-tool-stack-up-to-how-you-might-have-done-things-in-the-past\">What happened when you coded the full set of podcast episode descriptions? How did the tool stack up to how you might have done things in the past?<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Stocking:<\/strong> The biggest takeaway for us is that the process was very fast \u2013 especially compared to doing it by hand. The model took just a few days to churn through all 24,000 descriptions. That\u2019s a big improvement over multiple people working for weeks on end.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A lot of the issues we ran into were ones we would see with any classification project like this. Sometimes the model thought that people with similar names were the same person, or incorrectly classified show hosts as guests. But those are exactly the types of errors that human coders might make, too, and our validation and quality control processes ensured that we could spot those problems and fix them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Still, we discovered some challenges that were unique to this tool. For instance, our model\u2019s content moderation guardrails sometimes rejected episode descriptions if they mentioned concepts like crime or sex, even in fairly generic terms. It took some time to figure out what was happening and how to work around it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">All told, we found the tool to be extremely helpful for this particular project. But it isn\u2019t something you can just turn loose and expect to see good results. It needs guidance, oversight and guardrails. We regularly check each other\u2019s work when we <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/decoded\/2019\/10\/14\/how-we-check-numbers-and-facts-at-pew-research-center\/\">number-check<\/a> our reports, and that same sense of oversight applies to tools like this.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We asked researchers how they used the newest generation of large language models to analyze roughly 24,000 podcast episodes.<\/p>\n","protected":false},"author":658,"featured_media":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"sub_headline":"","sub_title":"","_crdt_document":"","_prc_public_revisions":[],"_ppp_expiration_hours":0,"_ppp_enabled":false,"ai_generated_summary":"","apple_news_api_created_at":"2024-02-06T18:50:22Z","apple_news_api_id":"658fa613-bd5f-4f14-8151-faa5b8443d74","apple_news_api_modified_at":"2024-02-06T21:49:48Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAAAg==","apple_news_api_share_url":"https:\/\/apple.news\/AZY-mE71fTxSBUfqluEQ9dA","apple_news_cover_media_provider":"image","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_cover_video_id":0,"apple_news_cover_video_url":"","apple_news_cover_embedwebvideo_url":"","apple_news_is_hidden":"","apple_news_is_paid":"","apple_news_is_preview":"","apple_news_is_sponsored":"","apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":[],"apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"relatedPosts":[],"_prc_fork_parent":0,"_prc_fork_status":"","_prc_active_fork":0,"datacite_doi":"","datacite_doi_citation":"","_prc_seo_qr_attachment_id":0,"spoken_article_player_enabled":true,"bylines":[{"key":"_6wow52l7i","termId":973}],"acknowledgements":[],"displayBylines":true,"footnotes":"","prc_watchers":[]},"categories":[299,350,402],"bylines":[973],"collection":[],"datasets":[],"_post_visibility":[],"formats":[467],"_fund_pool":[],"languages":[],"regions-countries":[515],"research-teams":[527,528],"workflow-status":[],"class_list":["post-7797","short-read","type-short-read","status-publish","hentry","category-artificial-intelligence","category-audio-radio-podcasts","category-research-explainers","bylines-aaron-smith","formats-short-read","regions-countries-united-states","research-teams-journalism","research-teams-methods"],"label":"Short Read","post_parent":0,"word_count":1138,"canonical_url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/short-reads\/2024\/02\/06\/how-we-used-large-language-models-to-identify-guests-on-popular-podcasts\/","art_direction":{"A1":{"id":8396,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png?w=564&h=317&crop=1","width":564,"height":317,"caption":"","chartArt":false},"A2":{"id":8396,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png?w=268&h=151&crop=1","width":268,"height":151,"caption":"","chartArt":false},"A3":{"id":8396,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png?w=194&h=110&crop=1","width":194,"height":110,"caption":"","chartArt":false},"A4":{"id":8396,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png?w=268&h=151&crop=1","width":268,"height":151,"caption":"","chartArt":false},"XL":{"id":8396,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png?w=720&h=405&crop=1","width":720,"height":405,"caption":"","chartArt":false},"social":{"id":8396,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2024\/02\/SR_24.02.05_PodcastGuestsQA_feature.png?w=1200&h=628&crop=1","width":1200,"height":628,"caption":"","chartArt":false}},"_embeds":[],"watchers":[],"table_of_contents":[],"datacite_doi":"","prc_seo_data":{"title":"How we used large language models to identify guests on popular podcasts","description":"We asked researchers how they used the newest generation of large language models to analyze roughly 24,000 podcast episodes.","og_title":"Q&amp;A: How we used large language models to identify guests on popular podcasts","og_description":"We asked researchers how they used the newest generation of large language models to analyze roughly 24,000 podcast episodes.","schema_type":"Article","noindex":false,"canonical_url":"","primary_terms":{"regions-countries":10822328,"research-teams":10818965},"custom_schema":[],"twitter_description":"We asked researchers how they used the newest generation of large language models to analyze roughly 24,000 podcast episodes.","og_image":8396,"indexnow_submitted_at":null,"gsc_index_status":null},"prepublish_checks":{},"apple_news_notices":[],"jetpack_sharing_enabled":true,"relatedPostsOrdered":[],"bylinesOrdered":[{"key":"_6wow52l7i","termId":973}],"acknowledgementsOrdered":[],"_links":{"self":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/short-read\/7797","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/short-read"}],"about":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/types\/short-read"}],"author":[{"embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/users\/658"}],"replies":[{"embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/comments?post=7797"}],"version-history":[{"count":3,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/short-read\/7797\/revisions"}],"predecessor-version":[{"id":303760,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/short-read\/7797\/revisions\/303760"}],"wp:attachment":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/media?parent=7797"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/categories?post=7797"},{"taxonomy":"bylines","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/bylines?post=7797"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/collection?post=7797"},{"taxonomy":"datasets","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/datasets?post=7797"},{"taxonomy":"_post_visibility","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/_post_visibility?post=7797"},{"taxonomy":"formats","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/formats?post=7797"},{"taxonomy":"_fund_pool","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/_fund_pool?post=7797"},{"taxonomy":"languages","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/languages?post=7797"},{"taxonomy":"regions-countries","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/regions-countries?post=7797"},{"taxonomy":"research-teams","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/research-teams?post=7797"},{"taxonomy":"workflow-status","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/workflow-status?post=7797"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}