1

I am working with a dataframe containing bills tabled in parliament and including, amongst other variables, the names of all MPs sponsoring the respective bill. The number of MPs supporting a given bill ranges from half a dozen to 450, meaning that there are a lot of NAs included, which are giving me a hard time. I want to count the frequencies of all unique names of MPs in the columns "MP1" to "MP448". For the sake of simplicity, let's work with only 3 different name columns:

df <- data.frame(cbind(Bill = c("housing" , "education" , "agriculture" , "drugs"),  
MP1 = c("Bob" , "Edgar" , "Chris" , "Bob"), MP2 = c("Susan" , "Julia", "Reece", ""), 
MP3 = c("" , "" , "Julia", "")))

> df
         Bill    MP1    MP2    MP3
1     housing    Bob  Susan     NA
2   education  Edgar  Julia     NA
3 agriculture  Chris  Reece  Julia
4       drugs    Bob     NA     NA

I want to count the frequencies of the different unique values of the columns containing the names of the MPs. The desired output would be something like this:

> frequencies
  Name  Count
   Bob      2
 Edgar      1
 Chris      1
 Susan      1
 Julia      2
 Reece      1

Thank you ever so much for your help!

  • 3
    Try `as.data.frame(table(unlist(df[2:4])))` – Ritchie Sacramento Sep 23 '19 at 22:15
  • 1
    Just a note; the `cbind()` in your code to make a data frame is not necessary. – neilfws Sep 23 '19 at 22:28
  • `select(df, matches("^MP")) %>% unlist(use.names = FALSE) %>% data.frame(MP = .) %>% mutate(MP = ifelse(!nchar(MP),NA, MP)) %>% count(MP)` – Carl Boneri Sep 24 '19 at 00:02
  • Thanks, H, your command seems to work quite well. There are two more things I'm struggling with, however. Maybe you guys have any idea?! 1. When counting the names of MPs, I get a lot of names with a frequency of 0. How would R give me names that do not occur in the columns under question? 2. When I let the command run across all of my 450 columns, I do not get names but random numbers and unreasonable frequencies, i.e. exceeding the number of obersvations in the dataframe. – Tobias Remschel Sep 24 '19 at 18:26

0 Answers0