{"id":39491,"date":"2018-12-17T09:50:57","date_gmt":"2018-12-17T14:51:00","guid":{"rendered":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/2018\/12\/17\/methodology-18-2\/"},"modified":"2024-07-25T11:27:54","modified_gmt":"2024-07-25T15:27:54","slug":"methodology-18-2","status":"publish","type":"post","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/methodology-18-2\/","title":{"rendered":"Methodology"},"content":{"rendered":"<p class=\"wp-block-paragraph\">To analyze image search results for various occupations, researchers completed a four-step process. First, they created a list of U.S. occupations based on Bureau of Labor Statistics (BLS) data. Second, they translated these occupation search terms into different languages. Third, the team collected data for both the U.S. and international analysis from Google Image Search and manually verified whether or not the image results were relevant to the occupations being analyzed. Finally, researchers deployed a machine vision algorithm to detect faces within photographs, and then estimate whether those faces belong to men or women. The aggregated results of those predictions are the primary data source for this report.<\/p>\n\n<h4 id=\"constructing-the-occupation-list\" class=\"wp-block-heading\">Constructing the occupation list<\/h4>\n\n<p class=\"wp-block-paragraph\">Because researchers wanted to compare the gender breakdown in image results to real-world gender splits in occupations, the team\u2019s primary goal was to match the terms used in Google Image searches with the titles in BLS as closely as possible.<\/p>\n\n<p class=\"wp-block-paragraph\">But the technical language of the BLS occupations sometimes led to questionable search results. For example, searches for \u201celigibility interviewers, government programs\u201d returned images from a small number of specialized websites that actually used that specific phrase, biasing results toward those websites\u2019 images. So, the research team decided to filter out highly technical terms, using Google Trends to assess relative search popularity, relative to a reference occupation (\u201cchildcare worker\u201d).<\/p>\n\n<p class=\"wp-block-paragraph\">The query selection process for the U.S. analysis involved the following steps:<\/p>\n\n<ol class=\"wp-block-list\">\n<li>Start with the list of <a href=\"https:\/\/www.bls.gov\/cps\/cpsaat11.htm\">BLS job titles<\/a> in 2017.<\/li>\n<li>Exclude occupations that do not have information about the fraction of women employed. For example, \u201ccredit analysts\u201d did not have information about the fraction of women in that occupation.<\/li>\n<li>Filter out occupations that do not have at least 100,000 workers in the U.S.<\/li>\n<li>Remove all occupations with ambiguous job functions (\u201call other,\u201d \u201cMisc.\u201d).<\/li>\n<li>Split all titles with composite job functions into individual job titles (For example, \u201cmodels and demonstrators\u201d to \u201cmodels,\u201d \u201cdemonstrators\u201d).<\/li>\n<li>Change plural words to singular (\u201cmodels\u201d to \u201cmodel\u201d) to standardize across occupations.[7. numoffset=&#8221;7&#8243; For one search (\u201cbellhops\u201d), researchers inadvertently used the plural form of the word for the U.S. analysis.]<\/li>\n<li>Manually inspect the list to ensure that the occupations were comprehensible and likely to describe human workers. This involved removing terms that might not apply to humans (such as tester, sorter) based on the researchers\u2019 review of Google results.<\/li>\n<li>Use Google Trends to remove unpopular or highly technical job titles. Highly technical job titles like \u201celigibility interviewers, government programs\u201d are searched for less frequently than less technical titles, such as \u201clawyer.\u201d Accordingly, researchers decided to remove technical terms in a systematic fashion by comparing the relative search intensity of each potential job title against that of a reasonably common job title.[8. Google Trends returns results on a scale from 0 to 100, with 100 representing the highest search intensity for the terms queried within the selected region and time frame and zero the lowest.] The research team compared the search intensity results for each occupation with the search intensity of \u201cchildcare worker\u201d using U.S. search interest in 2017. Any terms with search intensity below \u201cchildcare worker\u201d were removed from the list of job titles. The reference occupation \u201cchildcare worker\u201d was selected after researchers manually inspected the relative search popularity of various job titles and decided that \u201cchildcare worker\u201d was popular enough that using it as a benchmark would remove many highly technical search terms.<\/li>\n<\/ol>\n\n<p class=\"wp-block-paragraph\">The global part of the analysis uses a different list of job titles meant to capture more general descriptions of the same occupations. The steps to create that list include:<\/p>\n\n<ol class=\"wp-block-list\">\n<li>Start with the list of <a href=\"https:\/\/www.bls.gov\/cps\/cpsaat11.htm\">BLS job titles<\/a> that had at least 100,000 people working in the occupation in the U.S.<\/li>\n<li>Remove all occupations with ambiguous job functions (\u201call other\u201d, \u201cMisc.\u201d).<\/li>\n<li>Split all titles with composite job functions into individual job titles (For example, \u201cmodels and demonstrators\u201d to \u201cmodels,\u201d \u201cdemonstrators\u201d).<\/li>\n<li>Change plural words to singular (\u201cmodels\u201d to \u201cmodel\u201d) to standardize across occupations.<\/li>\n<li>Manually inspect the list to ensure that the occupations were comprehensible and likely to describe human workers. This involved removing terms that might not apply to humans (such as tester, sorter) based on manual review of Google results.<\/li>\n<li>Replace technical job titles with more general ones when possible to simplify translations and better represent searches. For example, instead of searching for \u201cpostsecondary teacher,\u201d the team searched for \u201cprofessor,\u201d and instead of \u201cchief executive,\u201d the team used \u201cCEO.\u201d<\/li>\n<li>Use Google Trends to filter unpopular job titles relative to a reference occupation (\u201cchildcare worker\u201d), following the same procedure described above. Any terms with search intensity below the search intensity of the reference occupation were removed.<\/li>\n<li>Select the top 100 terms with the most popular search intensity in Google in the U.S. within the past year.<\/li>\n<li>Translate each job title and determine which form to use when multiple translations were available.<\/li>\n<\/ol>\n\n<h4 id=\"translations\" class=\"wp-block-heading\">Translations<\/h4>\n\n<p class=\"wp-block-paragraph\">To conduct the international analysis, the research team chose to examine image results within a subset of G20 countries, which collectively account for 63% of the global economy. These countries include Argentina, Australia, Brazil, Canada, France, Germany, India, Indonesia, Italy, Japan, Mexico, Russia, Saudi Arabia, South Africa, South Korea, Turkey, the United Kingdom and the United States. The analysis excludes the European Union because some of its member states are included separately and China because Google is <a href=\"http:\/\/www.latimes.com\/business\/la-fi-0614-google-china-20140614-story.html\">blocked<\/a> in the country. Researchers used the official language of each country for each search (or the most popular language if there were multiple official languages), and worked with a translation service, cApStAn, to develop the specific search queries.<\/p>\n\n<p class=\"wp-block-paragraph\">To approximate search results for each country, researchers adjusted Google\u2019s country and language settings. For example, to search jobs in India, job titles were queried in the Hindi language with the country set to India. Several countries in the study share the same official language; for example, Argentina and Mexico both have Spanish as their official language. In these cases, researchers executed separate queries for each language and country combination. The languages used in the searches were: Modern Standard Arabic, English, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish and Turkish.<\/p>\n\n<p class=\"wp-block-paragraph\">Many languages spoken in these countries have gender-specific words for each occupation term. For example, in German, adding \u201cin\u201d to the end of the word \u201cmusiker\u201d (musician) gives a female connotation to the word. However, the word \u201cmusiker\u201d may not exclusively imply \u201cmale musician,\u201d and it is not the case that only male musicians can be referred to as \u201cmusiker.\u201d In consultation with the translation team, researchers identified the gender form of each job that would be used when a person of unknown gender is referenced, and searched for those terms. The male version is the default choice for most languages and occupations, but the translation team recommended using the feminine form for some cases when it was more commonly used. For example, researchers searched for \u201cnurse\u201d in Italian using the feminine term \u201cinfermi\u00e8re\u201d rather than the masculine \u201cinfirmier\u201d on the advice of the translation team.<\/p>\n\n<p class=\"wp-block-paragraph\">In addition, some titles do not have a directly equivalent title in another language. For example, the job term \u201ccompliance officer\u201d does not have an Italian equivalent. Finally, the same translated term can refer to different occupations in some languages. As a result, not all languages have exactly 20 search terms. Jobs lacking an equivalent translation in a given language were excluded.<\/p>\n\n<h4 id=\"data-collection\" class=\"wp-block-heading\">Data collection<\/h4>\n\n<p class=\"wp-block-paragraph\">To create the master dataset used for both analyses, researchers built a data pipeline to streamline image collection, facial recognition and extraction, and facial classification tasks. To ensure that a large number of images could be processed in a timely manner, the team set up a database and analysis environment on the Amazon Web Service (AWS) cloud, which enabled the use of graphics processing units (GPUs) for faster image processing. Building this pipeline also allowed the researchers to collect additional labeled training images relatively quickly, which they leveraged to increase the diversity of the training set in advance of classifying the image search results.<\/p>\n\n<p class=\"wp-block-paragraph\">Search results can be affected by the timing of the queries: Some photos could be more relevant during the time the query is executed, and therefore have a higher rank in the search results compared with searches at other times.<\/p>\n\n<p class=\"wp-block-paragraph\">There are a number of filters users can apply to the images returned by Google. Under \u201cTools,\u201d for example, users can signal to Google Image Search that they would like to receive images of different types, including \u201cFace,\u201d \u201cPhoto\u201d and \u201cClip Art,\u201d among other options. Users can also filter images by size and usage rights. For this study, researchers collected images using both the \u201cphoto\u201d and \u201cface\u201d filter settings, but the results presented in this report use the \u201cphoto\u201d filter only. Researchers made this decision because the \u201cphoto\u201d filter appeared to provide more diverse kinds of images than the \u201cface\u201d filter, while also excluding clip art and animated representations of jobs.<\/p>\n\n<h4 id=\"removing-irrelevant-queries\" class=\"wp-block-heading\">Removing irrelevant queries<\/h4>\n\n<p class=\"wp-block-paragraph\">For occupations included across both the U.S. and international analysis term lists, some queries returned images that did not depict individuals engaged in the occupation being examined. Instead, they often returned images that showed clients or customers, rather than practitioners of the occupation, or depicted non-human objects. For example, the majority of image results for the term \u201cphysical therapist\u201d showed individuals receiving care rather than individuals engaging in the duties associated with being a physical therapist.<\/p>\n\n<p class=\"wp-block-paragraph\">To ensure the relevance of detected faces, researchers reviewed all of the collected images for each language, country and occupation combination. For the U.S. analysis, there were a total of 239 sets of images to review. For the international analysis, there were 1,800 sets of images to review. Queries were categorized into one of four categories based on the contents of the collected images.<\/p>\n\n<ul class=\"wp-block-list\">\n<li>\u201cPass\u201d: More than half of collected images depict only individuals employed in the queried occupation. Overall, 44% of jobs in the U.S. analysis and 43% of jobs in the global analysis fell into this category.<\/li>\n<li>\u201cFail\u201d: The majority of collected images do not depict any face or depict faces irrelevant to the desired occupation. In many languages, the majority of collected images for the occupation \u201cbarber\u201d depict only people who have been to a barber, rather than the actual barber. In the analysis of international search results, this includes queries that return images of an occupation different from that initially defined by the English translation. For example, the Arabic translation of \u201cjanitor\u201d returns images of soccer goalies when queried in Saudi Arabia. Because the faces depicted in these images are not representative of the desired occupation, we categorize these queries as \u201cfail.\u201d A total of 31% of jobs in the U.S. analysis and 37% of jobs in the global analysis fell into this category.<\/li>\n<li>\u201cComplicated\u201d: The majority of collected images depict multiple people, some of whom are engaged in the queried occupation and some of whom are not. For example, the term \u201cpreschool teacher\u201d and its translations often return images that feature not only a teacher but also students. These queries are categorized as \u201ccomplicated\u201d because of the difficulty in isolating the relevant faces. A total of 23% of jobs in the U.S. analysis and 17% of jobs in the global analysis fell into this category.<\/li>\n<li>\u201cAmbiguous\u201d: Some queries do not fall into the other categories, as there is no clear majority of image type or it is unclear whether the people depicted in the collected images are engaged in the occupation of interest. This may occur if the term has many definitions, such as \u201ctrainer,\u201d which can refer to a person who trains athletes or various training equipment, or if the term has other usage in popular culture, such as the surname of a public figure (\u201cbaker\u201d) or the name of a popular movie (\u201ctaxi driver\u201d). Just 2% of jobs in the U.S. analysis and 3% of jobs in the global analysis fell into this category.<\/li>\n<\/ul>\n\n<p class=\"wp-block-paragraph\">To minimize any error caused by irrelevance of detected faces in collected images, we remove all queries categorized as \u201cfail,\u201d \u201ccomplicated\u201d or \u201cambiguous\u201d and only retain those queries categorized as \u201cpass.\u201d\n<a name=\"machine-vision\"><\/a><\/p>\n\n<h4 id=\"machine-vision-for-gender-classification\" class=\"wp-block-heading\">Machine vision for gender classification<\/h4>\n\n<p class=\"wp-block-paragraph\">Researchers used a method called \u201ctransfer learning\u201d to train a gender classifier, rather than using machine vision methods developed by an outside vendor. In some commercial and noncommercial <a href=\"https:\/\/arxiv.org\/pdf\/1611.00851.pdf\">alternative<\/a> classifiers, \u201cmultitask\u201d learning methods are used to simultaneously perform face detection, landmark localization, pose estimation, gender recognition and other face analysis tasks. The research team\u2019s classifier achieved high accuracy for the gender classification task, while allowing the research team to monitor a variety of important performance metrics.<\/p>\n\n<h4 id=\"face-detection\" class=\"wp-block-heading\">Face detection<\/h4>\n\n<p class=\"wp-block-paragraph\">Researchers used the face detector from the Python library <em>dlib <\/em>to identify all faces in the image. The program identifies four coordinates of the face: top, right, bottom and left (in pixels). This system <a href=\"http:\/\/dlib.net\/dnn_face_recognition_ex.cpp.html\">achieves<\/a> 99.4% accuracy on the popular <a href=\"http:\/\/vis-www.cs.umass.edu\/lfw\/\">Labeled Faces in the Wild<\/a> dataset. The research team cropped the faces from the images and stored them as separate files.<\/p>\n\n<p class=\"wp-block-paragraph\">Many images collected do not contain any individuals at all. For example, all images returned by Google for the German word \u201cBarmixer\u201d are images of a cocktail shaker product, even with the country search parameter set to Germany. To avoid drawing inference based on a small number of images, researchers included only queries that have at least 80 images downloaded and 50 images with at least one face detected in the analysis. Across different countries, the number of faces detected in the images varied. Hindi and Indonesian had the most detected faces. This means that their images tend to feature more people in them than other languages.<\/p>\n\n<p class=\"wp-block-paragraph\">The table below summarizes the number of queries, number of images and number of faces detected. Overall, researchers were able to collect over 95% of the top 100 images that we sought to download.<\/p>\n\n<h4 id=\"training-the-model\" class=\"wp-block-heading\">Training the model<\/h4>\n\n<p class=\"wp-block-paragraph\"><a name=\"training-images\"><\/a><\/p>\n\n<figure class=\"wp-block-image alignright\"><a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/?attachment_id=25579\" rel=\"attachment wp-att-25579\"><img data-dominant-color=\"edeeee\" data-has-transparency=\"false\" style=\"--dominant-color: #edeeee;\" decoding=\"async\" sizes=\"(max-width: 420px) 100vw, 420px\" class=\"wp-image-25579 not-transparent\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/wp-content\/uploads\/sites\/3\/2018\/12\/12.17.18_gender_jobs_methodology-00.png\" alt=\"Image search data by country-language\"><\/a><\/figure>\n\n<p class=\"wp-block-paragraph\">Recently, research has provided evidence of algorithmic bias in image classification systems from a variety of high-profile vendors.[9. See Buolamwini, Joy and Timnit Gebru. 2018. \u201c<a href=\"http:\/\/proceedings.mlr.press\/v81\/buolamwini18a.html\">Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification<\/a>.\u201d Proceedings of Machine Learning Research.]\u00a0This problem is believed to stem from imbalanced training data that often overrepresents white men. For this analysis, researchers decided to train a new gender classification model using a more balanced image training set.<\/p>\n\n<p class=\"wp-block-paragraph\">However, training an image classifier is a daunting task because collecting a large labeled dataset for training is very time and labor intensive, and often is too computationally intensive to actually execute. To avoid these challenges, the research team relied on a technique called \u201ctransfer learning,\u201d which involves recycling large pretrained neural networks (a popular class of machine learning models) for more specific classification tasks. The key innovation of this technique is that lower layers of the pretrained neural networks often contain features that are useful across different image classification tasks. Researchers can reuse these pretrained lower layers and fine-tune the top layers for their specific application \u2013 in this case, the gender classification task.<\/p>\n\n<p class=\"wp-block-paragraph\">The specific pretrained network researchers used is <a href=\"https:\/\/gist.github.com\/baraldilorenzo\/07d7802847aaad0a35d3\">VGG16<\/a>, implemented in the popular deep learning Python package <em>Keras<\/em>. The VGG network architecture was introduced by Karen Simonyan and Andrew Zisserman in their 2014 paper \u201c<a href=\"https:\/\/arxiv.org\/abs\/1409.1556\">Very Deep Convolutional Networks for Large Scale Image Recognition<\/a>.\u201d The model is trained using ImageNet, which has over 1.2 million images and 1,000 object categories. Other common pretrained models include ResNet and Inception. VGG16 contains 16 weight layers that include several <a href=\"https:\/\/en.wikipedia.org\/wiki\/Convolutional_neural_network#Convolutional_layer\">convolution<\/a> and fully connected layers. The VGG16 network has achieved a 90% top-5 accuracy in ImageNet classification.[10. The top-5 accuracy is calculated by counting the times a predicted label matched the target label, divided by the number of data points evaluated for the five categories with the highest probabilities.]<\/p>\n\n<p class=\"wp-block-paragraph\">Researchers began with the classic architecture of the VGG16 neural network as a base, then added one fully connected layer, one dropout layer and one output layer. The team conducted two rounds of training \u2013 one for the layers added for the gender classification task (the custom model), and subsequently one for the upper layers of the VGG base model.<\/p>\n\n<p class=\"wp-block-paragraph\">Researchers froze the VGG base weights so that they could not be updated during the first round of training, and restricted training during this phase to the custom layers. This choice reflects the fact that weights for the new layers are randomly initialized, so if we allowed the VGG weights to be updated it would destroy the information contained within them. After 20 epochs of training on just the custom model, the team then unfroze four top layers of the VGG base and began a second round of training. For the second round of training, researchers implemented an early-stopping function. Early stopping checks the progress of the model loss (or error rate) during training, and halts training when validation loss value ceases to improve. This serves as both a timesaver and keeps the model from overfitting to the training data.<\/p>\n\n<p class=\"wp-block-paragraph\">In order to prevent the model from overfitting to the training images, researchers randomly augmented each image during the training process. These random augmentations included rotations, shifting of the center of the image, zooming in\/out, and shearing the image. As such, the model never saw the same image twice during training.<\/p>\n\n<h4 id=\"selecting-training-images\" class=\"wp-block-heading\">Selecting training images<\/h4>\n\n<figure class=\"wp-block-image alignright\"><a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/2018\/12\/17\/methodology-18\/12-17-18_gender_jobs_methodology-01\/\" rel=\"attachment wp-att-25580\"><img data-dominant-color=\"e3e0e0\" data-has-transparency=\"false\" style=\"--dominant-color: #e3e0e0;\" decoding=\"async\" sizes=\"(max-width: 195px) 100vw, 195px\" class=\"wp-image-25580 not-transparent\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/wp-content\/uploads\/sites\/3\/2018\/12\/12.17.18_gender_jobs_methodology-01.png\" alt=\"Training datasets\"><\/a><\/figure>\n\n<figure class=\"wp-block-image alignright\"><a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/2018\/12\/17\/methodology-18\/12-17-18_gender_jobs_methodology-02\/\" rel=\"attachment wp-att-25581\"><img decoding=\"async\" class=\"wp-image-25581\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/wp-content\/uploads\/sites\/3\/2018\/12\/12.17.18_gender_jobs_methodology-02.png\" alt=\"Validation datasets\"><\/a><\/figure>\n\n<p class=\"wp-block-paragraph\">Image classification systems, even those that draw on pretrained models, require a substantial amount of training and validation data. These systems also demand diverse training samples if they are to be accurate across demographic groups. To ensure that the model was accurate when it came to classifying the gender of people from diverse backgrounds, researchers took a variety of steps. First, the team located existing datasets used by researchers for image analysis. These include the \u201cLabeled Faces in the Wild\u201d (LFW) and \u201cBainbridge 10K U.S. Adult Faces\u201d datasets. Second, the team downloaded images of Brazilian politicians from a site that hosts municipal-level election results. Brazil is a racially diverse country, and that is reflected in the demographic diversity in its politicians. Third, researchers created original lists of celebrities who belong to different minority groups and collected 100 images for each individual. The list of minority celebrities focused on famous black and Asian individuals. The list of famous blacks includes 22 individuals: 11 men and 11 women. The list of famous Asians includes 30 individuals: 15 men and 15 women. Researchers then compiled a list of the most-populous 100 countries and downloaded up to 100 images of men and women for each nation-gender combination, respectively (for example, \u201cFrench man\u201d). This choice helped ensure that the training data included images that feature people from a diverse set of countries, balancing out the over-representativeness of white people in the training dataset. Finally, researchers supplemented this list with a set of 21 celebrity seniors (11 men and 10 women) to help improve model accuracy on older individuals. This allowed researchers to easily build up a demographically diverse dataset of faces with known gender and racial profiles.<\/p>\n\n<p class=\"wp-block-paragraph\">Some images feature multiple people. To ensure that the images were directly relevant, a member of the research team reviewed each face manually and removed irrelevant or erroneous faces (e.g., men in images with women). Researchers also removed images that were too blurry, too small, and those where much of the face was obscured. In summary, the training data consist of 14,351 men and 12,630 women in images. The images belong to seven different datasets.<\/p>\n\n<h4 id=\"model-performance\" class=\"wp-block-heading\">Model performance<\/h4>\n\n<p class=\"wp-block-paragraph\">To evaluate whether the model was accurate, researchers applied it to a subset equivalent to 20% of the image sources: a \u201cheld out\u201d set which was not used for training purposes. The model achieved an overall accuracy of 95% on this set of validation data. The model was also accurate on particular subsets of the data, achieving 0.96 positive predictive value on the black celebrities subset, for example.<\/p>\n\n<p class=\"wp-block-paragraph\">As a final validation exercise, researched used an <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/internet\/2016\/07\/11\/research-in-the-crowdsourcing-age-a-case-study\/\">online labor market<\/a> to create a hand-coded random sample of 996 faces. This random subset of images overrepresented men \u2013 665 of the images were classified as male. Each face was coded by three online workers. For the 924 faces that had consensus across the three coders, the overall accuracy of this sample is 88%. Using the value 1 for \u201cmale\u201d and 0 for \u201cfemale,\u201d the precision and recall of the model were 0.93 and 0.90, respectively. This suggests that the model performs slightly worse for female faces, but that the rates of false positives and negatives was relatively low. Researchers found that many of the misclassified images were blurry, smaller in size, or obscured.<\/p>\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/2018\/12\/17\/methodology-18\/12-17-18_gender_jobs_methodology-03\/\" rel=\"attachment wp-att-25582\"><img decoding=\"async\" class=\"wp-image-25582\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/wp-content\/uploads\/sites\/3\/2018\/12\/12.17.18_gender_jobs_methodology-03.png\" alt=\"Model performance statistics\"><\/a><\/figure>\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/2018\/12\/17\/methodology-18\/12-17-18_gender_jobs_methodology-04\/\" rel=\"attachment wp-att-25583\"><img data-dominant-color=\"edeeee\" data-has-transparency=\"false\" style=\"--dominant-color: #edeeee;\" decoding=\"async\" sizes=\"(max-width: 420px) 100vw, 420px\" class=\"wp-image-25583 not-transparent\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/wp-content\/uploads\/sites\/3\/2018\/12\/12.17.18_gender_jobs_methodology-04.png\" alt=\"Comparison of BLS data and image search results\"><\/a><\/figure>\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/2018\/12\/17\/methodology-18\/12-17-18_gender_jobs_methodology-05\/\" rel=\"attachment wp-att-25584\"><img decoding=\"async\" class=\"wp-image-25584\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/wp-content\/uploads\/sites\/3\/2018\/12\/12.17.18_gender_jobs_methodology-05.png\" alt=\"Comparison of BLS data and image search results\"><\/a><\/figure>\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/2018\/12\/17\/methodology-18\/12-17-18_gender_jobs_methodology-06\/\" rel=\"attachment wp-att-25585\"><img decoding=\"async\" class=\"wp-image-25585\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/wp-content\/uploads\/sites\/3\/2018\/12\/12.17.18_gender_jobs_methodology-06.png\" alt=\"Comparison of BLS data and image search results\"><\/a><\/figure>\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/2018\/12\/17\/methodology-18\/12-17-18_gender_jobs_methodology\/\" rel=\"attachment wp-att-25587\"><img data-dominant-color=\"f1f1ef\" data-has-transparency=\"false\" style=\"--dominant-color: #f1f1ef;\" decoding=\"async\" sizes=\"(max-width: 310px) 100vw, 310px\" class=\"wp-image-25587 not-transparent\" src=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/wp-content\/uploads\/sites\/3\/2018\/12\/12.17.18_gender_jobs_methodology.png\" alt=\"Comparison of BLS data and image search results\"><\/a><\/figure>\n\n<p class=\"wp-block-paragraph\">These results are also available in downloadable form <a href=\"https:\/\/alpha.pewresearch.org\/pewresearch-org\/social-trends\/dataset\/percent-of-total-employed-in-the-us-and-and-google-image-search-results-by-gender\/\">here<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>To analyze image search results for various occupations, researchers completed a four-step process. First, they created a list of U.S. occupations based on Bureau of Labor Statistics (BLS) data. Second, they translated these occupation search terms into different languages. Third, the team collected data for both the U.S. and international analysis from Google Image Search [&hellip;]<\/p>\n","protected":false},"author":367,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"sub_headline":"","sub_title":"","_prc_public_revisions":[],"_ppp_expiration_hours":0,"_ppp_enabled":false,"ai_generated_summary":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"relatedPosts":[],"reportMaterials":[],"multiSectionReport":[],"package_parts__enabled":false,"package_parts":[],"_prc_fork_parent":0,"_prc_fork_status":"","_prc_active_fork":0,"datacite_doi":"","datacite_doi_citation":"","_prc_seo_qr_attachment_id":0,"spoken_article_player_enabled":true,"displayBylines":true,"footnotes":"","prc_watchers":[],"jetpack_post_was_ever_published":false},"categories":[234,353,210,255,30,247,216],"tags":[],"bylines":[936,767,819,931],"collection":[],"datasets":[1050],"level_of_effort":[],"primary_audience":[],"information_type":[],"_post_visibility":[],"formats":[458],"_fund_pool":[],"languages":[],"regions-countries":[],"research-teams":[521,519],"workflow-status":[],"class_list":["post-39491","post","type-post","status-publish","format-standard","hentry","category-business-workplace","category-data-science","category-economics-work-gender","category-gender-demographics","category-gender-lgbtq","category-gender-work","category-gender-equality-discrimination","bylines-adam-hughes","bylines-brian-broderick","bylines-onyi-lam","bylines-stefan-wojcik","datasets-percent-of-total-employed-in-the-u-s-and-google-image-search-results-by-gender","formats-report","research-teams-data-labs","research-teams-social-trends"],"label":false,"post_parent":39502,"word_count":3378,"canonical_url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/methodology-18-2\/","art_direction":{"A1":{"id":48986,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png?w=564&h=317&crop=1","width":564,"height":317,"chartArt":false},"A2":{"id":48986,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png?w=268&h=151&crop=1","width":268,"height":151,"chartArt":false},"A3":{"id":48986,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png?w=194&h=110&crop=1","width":194,"height":110,"chartArt":false},"A4":{"id":48986,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png?w=268&h=151&crop=1","width":268,"height":151,"chartArt":false},"XL":{"id":48986,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png?w=720&h=405&crop=1","width":720,"height":405,"chartArt":false},"social":{"id":48986,"rawUrl":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/PL_18.12.10_featured_imageSearch.png?w=1200&h=628&crop=1","width":1200,"height":628,"chartArt":false}},"_embeds":[],"watchers":[],"table_of_contents":[{"id":39502,"title":"Gender and Jobs in Online Image Searches","slug":"gender-and-jobs-in-online-image-searches","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/gender-and-jobs-in-online-image-searches\/","is_active":false},{"id":39479,"title":"Acknowledgments","slug":"acknowledgments-23","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/acknowledgments-23\/","is_active":false},{"id":39491,"title":"Methodology","slug":"methodology-18-2","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/methodology-18-2\/","is_active":true}],"report_materials":[{"key":"41ba3868-68b9-4321-8ffc-48be13804086","type":"report","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-content\/uploads\/sites\/20\/2018\/12\/JobsGender_report_FINAL1.pdf","label":"","icon":"","attachmentId":48997},{"type":"dataset","id":1050,"label":"Percent of total employed in the U.S. and Google image search results by gender","url":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/dataset\/percent-of-total-employed-in-the-u-s-and-google-image-search-results-by-gender\/"}],"report_pagination":{"current_post":{"id":39491,"title":"Methodology","slug":"methodology-18-2","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/methodology-18-2\/","is_active":true,"page_num":3},"next_post":null,"previous_post":{"id":39479,"title":"Acknowledgments","slug":"acknowledgments-23","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/acknowledgments-23\/","is_active":false,"page_num":2},"pagination_items":[{"id":39502,"title":"Gender and Jobs in Online Image Searches","slug":"gender-and-jobs-in-online-image-searches","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/gender-and-jobs-in-online-image-searches\/","is_active":false,"page_num":1},{"id":39479,"title":"Acknowledgments","slug":"acknowledgments-23","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/acknowledgments-23\/","is_active":false,"page_num":2},{"id":39491,"title":"Methodology","slug":"methodology-18-2","link":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/data-labs\/2018\/12\/17\/methodology-18-2\/","is_active":true,"page_num":3}]},"parent_info":{"parent_title":"Gender and Jobs in Online Image Searches","parent_id":39502},"materialsOrdered":[],"chaptersOrdered":[],"partsOrdered":[],"partsEnabled":false,"datacite_doi":"","prc_seo_data":{"title":"Methodology","description":"To analyze image search results for various occupations, researchers completed a four-step process. First, they created a list of U.S. occupations based on Bureau of Labor Statistics (BLS) data. Second,&hellip;","og_title":"Methodology","og_description":"","schema_type":"Article","noindex":false,"canonical_url":"","primary_terms":[],"custom_schema":[],"og_image":48986,"indexnow_submitted_at":null,"gsc_index_status":null},"prepublish_checks":{"prc-image-alt-text":{"status":"complete","message":"All images have alt text.","data":null},"prc-about-this-research":{"status":"incomplete","message":"Add an \"About this research\" details block.","data":null},"prc-paragraph-count":{"status":"complete","message":"Found 30 paragraphs.","data":{"count":30}},"prc-internal-link":{"status":"complete","message":"Found 10 internal links.","data":{"count":10}}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"relatedPostsOrdered":[],"bylinesOrdered":[],"acknowledgementsOrdered":[],"_links":{"self":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/posts\/39491","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/users\/367"}],"replies":[{"embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/comments?post=39491"}],"version-history":[{"count":3,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/posts\/39491\/revisions"}],"predecessor-version":[{"id":182972,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/posts\/39491\/revisions\/182972"}],"wp:attachment":[{"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/media?parent=39491"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/categories?post=39491"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/tags?post=39491"},{"taxonomy":"bylines","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/bylines?post=39491"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/collection?post=39491"},{"taxonomy":"datasets","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/datasets?post=39491"},{"taxonomy":"level_of_effort","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/level_of_effort?post=39491"},{"taxonomy":"primary_audience","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/primary_audience?post=39491"},{"taxonomy":"information_type","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/information_type?post=39491"},{"taxonomy":"_post_visibility","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/_post_visibility?post=39491"},{"taxonomy":"formats","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/formats?post=39491"},{"taxonomy":"_fund_pool","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/_fund_pool?post=39491"},{"taxonomy":"languages","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/languages?post=39491"},{"taxonomy":"regions-countries","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/regions-countries?post=39491"},{"taxonomy":"research-teams","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/research-teams?post=39491"},{"taxonomy":"workflow-status","embeddable":true,"href":"https:\/\/alpha.pewresearch.org\/pewresearch-org\/wp-json\/wp\/v2\/workflow-status?post=39491"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}