Names of b.....s badder than Taylor Swift, a class in women's studies?

Once again, a Twitter trend sent me to my R prompt… Here is a bit of context. My summary: Taylor Swift apparently plays the bad girl in her new album and a fan of hers asked a question…

The tweet was then quoted by many people mentioning badass women, and I decided to have a look at these heroes!

Name tweets badder than Nutella’s

I was a bit lazy and asked Mike Kearney, rtweet maintainer, how to find tweets quoting a tweet, to which Bob Rudis answered. Now that I even had the code, it was no trouble at all getting the data. I added the filtering steps myself, see, I’m not that lazy. I also removed the link to the quoted tweet that was at the end of each tweet.

question_tweet <- "928857792982781952"
badass <-  rtweet::search_tweets(question_tweet, n = 18000, include_rts = FALSE)
badass <- dplyr::filter(badass, is_quote_status, quote_status_id == question_tweet)
badass <- dplyr::mutate(badass, text = stringr::str_replace(text, "https://t\\.co/.*$", ""))
badass <- dplyr::mutate(badass, text = trimws(text))
readr::write_csv(badass, path = "data/2017-12-03-badderb_badass.csv")

I obtained 15653 tweets. Not bad!

library("magrittr")
set.seed(20171015)

indices <- sample.int(n = nrow(badass), size = 7)
badass$text[indices]
## [1] "Carmina Barrios"                                                                                                                                                                                                      
## [2] "Anyone"                                                                                                                                                                                                               
## [3] "Shirley Temple"                                                                                                                                                                                                       
## [4] "So a lot of people have shared some bad ass women in repsonse to the tweet below. I hope someone is compiling those responses for one kick ass book."                                                                 
## [5] "Mary Bowers was a slave with a photographic memory who pretended to be \"slow-witted\" in order to spy on confederate soldiers in the house worked, and pass intel she gathered to Union forces during the Civil War."
## [6] "Ramona Quimby, age 8"                                                                                                                                                                                                 
## [7] "Snow White. Sleeping Beauty. Cinderella, Elsa, Ariel bec the category example is obvs mediocre Disney caricatures allowed by the patriarchy, right?"

Unnamed badder b…..s: your mother and grand-mother

Out of 15653, 570 contained the word “mother” – I haven’t looked for the word “mum” and haven’t checked for the fact that it is someone from the family of the tweet author. Here a few of the personal stories (or not) identified with this quick and dirty method.

set.seed(20171015)
mothers <- dplyr::filter(badass, stringr::str_detect(text, "mother"))
indices <- sample.int(n = 15)
mothers$text[indices]
##  [1] "My grandmother struggled through poverty her entire life, in a family prone to depression, addiction, and suicide. She had so many Caesarian sections that when she reached menopause she didn’t have a belly button anymore."                                                       
##  [2] "Mom married my dad. He had been married 2X before with a total of 10 kids. His first wife left him and the kid.  2nd wife died after their divorce.  He had custody of all 10, judge asked my mother if she would take them in, she did. Raised his 10 and had 3 with dad. 13 in all"
##  [3] "Each of my grandmothers raised 11 kids <f0><U+009F><U+0098><U+008F>"                                                                                                                                                                                                                 
##  [4] "My mother"                                                                                                                                                                                                                                                                           
##  [5] "My grandmother."                                                                                                                                                                                                                                                                     
##  [6] "My mother in law, who isn’t a “bitch” in any way, shape, or form, and raised three girls on her own (and who are all bad asses in their own way) without the help of a deadbeat ex."                                                                                                 
##  [7] ".@RuPaul for starters. And my great grandmother who drove a car when she was 12. My dog, too. She is definitely a bad bitch."                                                                                                                                                        
##  [8] "My grandmother escaped from the Tsar with nothing but the clothes on her back."                                                                                                                                                                                                      
##  [9] "My mother"                                                                                                                                                                                                                                                                           
## [10] "Almost any woman alive today,@xnulz \n\nAnd - Here’s another one, for sure (PLUS she’s a good mother):\n\nTonya Harding; @ITonyaMovie \n\n<U+26F8><f0><U+009F><U+00A5><U+008A><U+26F8><f0><U+009F><U+00A5><U+008A><U+2764><U+FE0F><f0><U+009F><U+00A5><U+008A><U+26F8><f0><U+009F><U+00A5><U+008A><U+26F8>"
## [11] "all of the single mothers doing the most they can for their children"                                                                                                                                                                                                                
## [12] "My grandmother was married off at the age of 14 to an older man who had already been married once, dealt with an abusive marriage but stuck around for the kids and went back to school after having 5 children and worked for 20 years to support her family"                       
## [13] "my grandmother was an army nurse in WW2.\ntaught me how to tourniquet a leg and bandage it using only gauze"                                                                                                                                                                         
## [14] "My wife birthed a goddamn child. My mother and grandmother both birthed multiple. This is too easy."                                                                                                                                                                                 
## [15] "My grandmother worked for the OSS in London during WW2 as a code breaker."

Can we talk about that belly button thing?! I’m also happy to see a diversity of things they were recognized for.

Names of the badder b…..s

Quite a few of the tweets from this trend contained the name of someone. In order to extract these names, I resorted to a language processing method called entity extraction, the entity here being a person. For that, I could have used an extractor module of the Monkeylearn platform via my own monkeylearn package.

Instead, I chose to illustrate a different method: using the cleanNLP package that I know from the excellent R Journal paper presenting it. Among other things, it serves as an interface between R and the Python library spaCy and also as an interface between R and the coreNLP Java library. Installing these tools is the painful part of the setup, but 1) you only need to install one of them 2) there are detailed instructions here 3) once your tool is installed, using the package is a breeze (and well independent of any rate limit contrary to monkeylearn use). I am at that breeze stage, you can be jealous.

There were a few tweets with infuriating encoding issues, BOM or something like that, and I decided to just ignore them by using purrr::possibly. I obviously did this to illustrate the use of this purrr function, not out of laziness.

library("cleanNLP")
init_spaCy()
# we need to remove characters like "\u0098"
badass <- dplyr::mutate(badass, text = enc2native(text))

get_entities_with_text <- function(x){
  obj <- run_annotators(x, as_strings = TRUE)
  entities <- get_entity(obj)
  entities$text <- x
  entities
}

possibly_get_entities <- purrr::possibly(get_entities_with_text,
                                         otherwise = NULL)

entities <- purrr::map_df(badass$text, possibly_get_entities)

readr::write_csv(entities, path = "data/2017-12-03-badderb_entities.csv")

I got at least one entity for 7504 out of 15653 tweets, and at least one person for 4664. I am very satisfied with this.

So, who are you, badder b…..s?

We get this kind of entities: NORP, CARDINAL, LANGUAGE, GPE, DATE, ORG, PERSON, TIME, LOC, MONEY, WORK_OF_ART, EVENT, FAC, QUANTITY, LAW, PRODUCT, ORDINAL, PERCENT. I’m more interested in PERSON and no, I’m not shouting. I chose to look at the top 12 in order to get a top 10 excluding Taylor Swift herself.

entities %>%
  dplyr::filter(entity_type == "PERSON") %>%
  dplyr::group_by(entity) %>%
  dplyr::summarise(n = n()) %>%
  dplyr::arrange(- n) %>%
  head(n = 12) %>%
  knitr::kable()
entity n
Taylor Swift 213
Taylor 145
Rosa Parks 140
Harriet Tubman 109
Dora 90
Rose West 85
Lyudmila Pavlichenko 77
Joan 71
Marie Curie 57
Myra Hindley 50
Nancy Wake 45
Hillary Clinton 41

At that point I did feel like bursting out laughing though. Dora! And I checked, we’re talking about Dora the explorer! Joan is Joan of arc. Interestingly in that top 10 we’re mixing really bad persons, e.g. Myra Hindley was a serial killer, and really badass persons, like Rosa Parks. My husband will be happy to see Marie Curie in this list, since he’s a big fan of hers, having even guided a few tours about her life in Paris.

Looking at the most frequently mentioned women obviously makes us loose well wrongly written names, and most importantly personal stories of badass mothers and the like, and of native women for instance, although I have the impression of having read about a few but probably because of my following Auriel Fournier.

Writing history?

I saw someone said they’d use the tweets as basis for history lessons. In order to get a view of a person, one could concatenate the tweets about them. Take Marie Curie for instance.

entities %>%
  dplyr::filter(entity_type == "PERSON", entity == "Marie Curie") %>%
  dplyr::summarise(text = toString(text)) %>%
  .$text
## [1] "Rachel Carson, Marie Curie, Ruth Bader Ginsberg, Madeleine Albright, Diane Fossey, Helen Keller, Gloria Steinem, Madonna, Aretha Franklin, Margot Lee Shetterly, Malala Yousafzai, and a whole lot more., Rosa Parks. Harriet Tubman. Anne Frank. Malala Yousafazi, Susan B. Anthony, Sally Ride, Marie Curie. Margaret Thatcher, Indira Ghandi, Golda Meier., ...it’s not bitch. It’s woman.\n\nMarie Curie\n\nRosa Parks\n\nEleanor Roosevelt\n\nHedy Lamar\n\nSappho\n\nAbagail Adams\n\nFlorence Nightingale\n\nSally Ride\n\nMargaret Chase Smith\n\nAnne Frank\n\nMargaret Thatcher \n\nSandra Day O’Connor\n\nOprah\n\nLilith\n\nMarilyn Monroe\n\nDita Von Teese, Ruby Bridges, Barbara Jordan, Marie Curie, Rosa Parks, Ida B Wells, Susan B Anthony, Harriet Tubman, ..., Lorde. Etc., I mean, off hand, Marie Curie was denied access to University because she was a woman, educated herself, served as a surgeon in WW1 and became the only person ever to earn two Nobel prizes, Emmeline Pankhurst, Amelia Earhart, Florence Nightingale, Rosa Parks, Joan of Arc, Marie Curie, Lilian Ngoyi, Helen Suzman, Katherine Johnson,  Frida Khalo, Catherine the Great, Eleanor Roosevelt, Mary Wollstonecraft, Valentina Tereshkova.., lmao Elizabeth Schuyler-Hamilton, Hedy Lamar, Amelia Earhart, Marie Curie, Sophie Scholl, Katherine Johnson, Dorothy Vaughn, Mary Jackson, Margaret Hamilton, Ada Lovelace, Anne Bonny, Calamity Jane, George Eliot, Mary Read, Mary Wolstncraft, Mary Shelley... need I go on?! <f0><U+009F><U+0098><U+00A1>, Marie Curie, Rose Parks\nFrancis Perkins\nJeanette Rankin\nSally Ride\nMarie Curie\nJK Rowling\n\nShall I go on?, Marie Curie: she invented radioactivity, got the Nobel prize twice and found polonium and radium., Well I would start with not calling a woman a “bitch”.\nAfter that I would say Hedy Lamarr, Susan B. Anthony, any of the @HiddenFigures, Marie Curie, Sally Ride, Molly Pitcher, Clara Barton … shall I continue?, Marie Curie literally died doing Nobel Prize winning research., Harriet Tubman, Shirley Chisholm, Rosa Parks, Ida B. Wells, Marie Curie, Maxine Waters, Lena Horne,  Ruby Dee, Angela Davis, My grandmama, my great-aunt Priscilla, my mama, my sister..., Marie Curie. Mother of radioactivity. First and only woman to win the nobel prize twice. Died due to radiation exposure., Rosa Parks, Freya Stark, Ida B Wells, Sally Ride, Marie Curie, Margaret Heafield, Amelia Earheart, Ruth Bader Ginsburg, Abigail Adams, Jane Goodall, Malala how much time you got?, Marie Curie made mobile x-ray unit, learned anatomy &amp; auto repairs, trained women, raised money, ran IRC's medical x-ray division in WWI., Marie Curie. No explanation needed., Rosa Parks, Katherine Johnson, Dorothy Vaughn, Mary Jackson, Bettie Page, Emma Watson, Malala Yousafzai, Emily Pankhurst, Anne Frank, Marie Curie, Joan of Arc, Boudicca... need I go on?, Marie Curie., Malala, Michelle Obama, Frida Kahlo, Emma Watson, Maya Angelou, Marie Curie, Lady Gaga, JK Rowling, Princess Diana, Carrie Fisher, Anne Frank, Laverne Cox...\n\n literally so many women... I could go on for weeks..., Marie Curie, Mary Anning, Dorothy Hodgkin, Jane Austin, Katherine Johnson, my GCSE geography and A Level geology teacher Sheila Tanner...., Marie Curie. She handled radioactive materials with barehand and won the Nobel Prize. How bout that? #RealTimeChem, Rosa Parks, Nellie Bly, Harriet Tubman, Jane Adams, Eleanor Roosevelt, Elizabeth Blackwell, Marie Curie, Amelia Earhart, Cleopatra, Queen Elizabeth I, Ella Fitzgerald, Grace Hopper, Dolores Huerta, Shirley Jackson, Joan of Arc, Helen Keller, Sacagawea, Sappho the Greek poet..., Marie Curie., Marie Curie., Marie Curie, Maya Angelou, Rosa Parks, Michelle Obama, my  grandma, my neuroscientist mum, Danica Patrick, the Queen of England, Angela Merkel,  @chrissyteigen @BrookeBCNN @KateBolduan - Me, Rosa Parks, Harriet Tubman, Shirley Chisholm, Ella Fitzgerald, Ruth Bader Ginsburg, Esperonza Spaulding, Sister Rossetta Tharpe, Sarah Vaughn, Aretha Franklin, Eartha Kitt, Carrie Fisher, Patti Smith, Jane Goodall, Marie Curie, Sally Ride, Leontyne Price, Princess Diana, etc..., Whoa hey now, Marie Curie won Nobel Prizes in two different sciences, only person to have done that. Probably had a lovely singing voice, too, before it got demolished by the radioactivity she was discovering for the betterment of humanity., Rosa Parks. Laura Secord. Hillary Rodham Clinton. Nellie McClung. Queen Victoria. Joan of Arc. Myrlie Evers-Williams. Tina Turner. Jane Goodall. Ruth Bader Ginsburg. Viola Desmond. Jennie Trout. Malala  Yousafzai. Margret Atwood. Ameila Earhart. J.K. Rowling. Marie Curie., Marie Curie. Won the Nobel Prize. Twice., Just off the top of my head: Elizabeth Cady Stanton, Clara Barton, Marie Curie, Amelia Earhart, Elizabeth I, Joan of Arc, Helen Keller…., Amelia Earhart, Rosa Parks, Sacagawea, Pocahontas, Susan B. Anthony, Noor Inayat Khan, Marie Curie, Alice Paul, Margaret Thatcher, Harriet Tubman, Maya Angelou, Lucy Stone, Eleanor Roosevelt ..... just to name a few., Marie Curie., The kashariyot. Shirley Chisholm. bell hooks. Golda Meir. Edith Bunker. Hedy Lamar. Marie Curie. Bella Abzug. Michelle Obama. Judith Halberstam. Jo March. Hillary Rodham Clinton., Hedy Lamarr. Marie Curie. Mae Jemison. Sally Ride. Sonia Sotomayor. Ruth Bader Ginsburg. Beyoncé. Mulan. My wife., Marie Curie is the only person to win two Nobel Prizes in two separate scientific fields and died of leukaemia as a result of her research into radionuclides which are used in medicine to save thousands of lives a year, Marie Curie.\n2 Nobel Prizes., Marie Curie who died discovering two elements and isolated radioactive isotopes., Rosa Parks, Marie Curie, Charlize Theron, Noor Inayat Khan, Rihanna, Serena Williams, Ruth Bader Ginsberg, basically all these women on this list:, Marie Curie developed the theory of radioactivity, techniques for isolating isotopes, and discovered two chemical elements. She died of aplastic anemia from exposure to radiation over the course of her research and her work at field hospitals during WWI., Marie Curie won two Nobel prizes for two different fields of science and her notebook is so radioactive that exposure to it would be lethal., How much time have you got? How about Marie Curie, or Amelia Earhart, Nancy Hart? Arlene Limas? Tomoe Gozen? Lucretia Mott? Irina Sendler? Nothing against Ms. Swift, but we could do this all day…, Marie Curie, Nobel prizes in Physics and Chemistry., Marie Curie? 1st woman in Europe to get a PhD and 1st woman to win a Nobel Prize (which she did twice, in Physics and Chemistry). She also developed a theory of a radioactivity &amp; found the elements radium and polonium. A pretty bad bitch., Marie Curie was a badass scientist who won two Nobel prizes and ate radium for breakfast. #thatlastpartwasntsogood, Rosa Parks? Harriet Tubman? Louisa May Alcott? Susan B. Anthony? Marie Curie? Amelia Earhart? Cleopatra? Frida Khalo? Mother Theresa? Sandra Day O’Connor? Oprah Winfrey? Sally Ride? Margaret Sanger? Anne Frank? Joan of Arc? Sacajawea? Sappho? Sojourner Truth? Malala Yousafzai?, Cleopatra. Mata Hari. Marie Curie. Susan B. Anthony. Hannah Arendt. . . . \n\nIt's really not hard, 1. Malala Yousafzai, Marie Curie, Rosa Parks, Georgia O'Keeffe, Maria Mitchell, Tanya Tagaq, Billie Holiday, Valentina Tereshkova, Sally Ride, Anne Frank, Maya Angelou, Sacheen Littlefeather, Frida Kahlo, Harper Lee, Marsha P Johnson, Sylvia Rivera, Helen Keller, Margaret Sanger,, Marie Curie, Marie Curie., Ada Lovelace, Jane Austen, Helen Sharman, Margaret Atwood, Mary Beard, Rosalind Franklin, Hedy Lamarr, Marie Curie, Helen Keller, Emmeline Pankhurst, Rosa Parks, Florence Nightingale, Brenda Hale, QE1, Boudicca, Amy Johnson, Elizabeth Fry, Octavia Hill, Mary Wollstonecraft, &amp;c.<f0><U+009F><U+0096><U+00A4>, Marie Curie. Most people are lucky to have 1 Nobel prize, or share it. She has 2., Sojourner Truth, Khutulun, Marie Curie, Queen Lillikulani, Hedy Lamar, Empress Wu Zetain, Hellen Mirren, Raden Ajeng Kartini, Ida B. Wells, Artemisia Gentileschi, Nancy Wake, Tomoe Gozen, Yaa Asantewaa, Boudicca, Juana Inés de la Cruz..., Marie Curie, whose greatest discovery and life’s work also ended up being what killed her and who is the only person to win a Nobel Prize in two different sciences., Joan of Arc, Marie Curie, Queen Elizabeth, Cleopatra, my mom, the lady in front of me on line at the grocery store...\n\nIt's a pretty long list., Rosa Parks. All of the women from Hidden Figures. Marie Curie. Any woman in STEM fields., Marie Curie \nThat's where the list starts"

Doing this one also gets the name of many other women. Moreover, if writing history lessons, one should have several sources, right? What about Wikidata like in this other blog post of mine? It should have data for at least the most famous badass women.

# add a function for getting a silent answer
quietly_query <- purrr::quietly(WikidataQueryServiceR::query_wikidata)

# function for getting someone's data
get_wikidata <- function(name, pb = NULL){
  if (!is.null(pb)) pb$tick()$print()
  Sys.sleep(1)
  item <- WikidataR::find_item(name, language = "en")
  # sometimes people have no Wikidata entry so I need this condition
  if(length(item) > 0){
    entity_code <- item[[1]]$id
    query <-  paste0("PREFIX entity: <http://www.wikidata.org/entity/>
                     #partial results
                     
                     SELECT ?propUrl ?propLabel ?valUrl ?valLabel ?picture
                     WHERE
                     {
                     hint:Query hint:optimizer 'None' .
                     {	BIND(entity:",entity_code," AS ?valUrl) .
                     BIND(\"N/A\" AS ?propUrl ) .
                     BIND(\"identity\"@en AS ?propLabel ) .
                     }
                     UNION
                     {	entity:", entity_code," ?propUrl ?valUrl .
                     ?property ?ref ?propUrl .
                     ?property rdf:type wikibase:Property .
                     ?property rdfs:label ?propLabel
                     }
                     
                     ?valUrl rdfs:label ?valLabel
                     FILTER (LANG(?valLabel) = 'en') .
                     OPTIONAL{ ?valUrl wdt:P18 ?picture .}
                     FILTER (lang(?propLabel) = 'en' )
                     }
                     ORDER BY ?propUrl ?valUrl
                     LIMIT 200")
    results <- quietly_query(query) 
    results <- results$result
    results$name<- name
    results
  }else{
    NULL
  }
   
  }

Yes, I just had to replace all occurrences of “sv” with “en” to get a function for this post. I’d like to try to write an automatic text about badass women.

get_a_string <- function(prop, prep, wikidata){
  answer <- dplyr::filter(wikidata, propLabel == prop) %>%
    .$valLabel %>%
    unique() %>%
    toString
  if(answer == ""){
    return("")
  }else{
    return(paste(prep, answer))
  }
}

tell_me_about <- function(name){
  wikidata <- get_wikidata(name)
  questions <- c("occupation", "country of citizenship",
                 "field of work", "award received")
  
  words <- c("a", "from", 
             "known from her work in", "and who was awarded")
  
  strings <- purrr::map2_chr(questions, words, 
                            get_a_string,
                            wikidata = wikidata)
  
  strings <- strings[strings != ""]
  
  sentence <- paste(name, "was", toString(strings))
  sentence <- paste0(sentence, ".")
  return(sentence)
}

Ok, let’s try our automatic history writing function. It won’t work for Dora and Joan, sadly.

tell_me_about("Lyudmila Pavlichenko")
## [1] "Lyudmila Pavlichenko was a historian, sniper, military personnel, from Soviet Union, and who was awarded Medal \"For the Defence of Sevastopol\", Medal \"For the Defence of Odessa\", Hero of the Soviet Union, Order of Lenin, Medal \"For Battle Merit\", Gold Star, Medal \"For the Victory over Germany in the Great Patriotic War 1941–1945\"."
tell_me_about("Myra Hindley")
## [1] "Myra Hindley was a criminal, from United Kingdom."
tell_me_about("Harriet Tubman")
## [1] "Harriet Tubman was a writer, from United States of America, and who was awarded New Jersey Hall of Fame, National Women's Hall of Fame, Maryland Women's Hall of Fame."

Not many details clearly, but not too bad for a quickly written history hum bot, if I can call it so.

So, happy, Nutella?

This was my contribution to the meme following Nutella’s viral tweet. I am thankful for the badass women I did end up discovering thanks to the tweets, and am waiting for someone to replace the lyrics of all Taylor Swift’s songs with gems from this Twitter trend.