0

My data is a list of academic institutions affiliated with authors of articles, and the piece I'm working with looks something like this:

1   MIT
2   NBER; NBER
3   U MI; Cornell U; U VA
4   Harvard U; U Chicago
5   U OR; U CA, Davis; U British Columbia
6   World Bank; Dartmouth College; EDHEC Business School; Harvard U
7   Columbia U and IZA; Columbia U and IZA
8   World Bank; Yale U and Abdul Latif Jameel Poverty Action Lab; Dartmouth College
9   Carnegie Mellon U; Carnegie Mellon U; Carnegie Mellon U
10  Columbia U; U CA, San Diego
11  U CA, Berkeley; McMaster U; McMaster U
12  ETH Zurich and CESifo; U Copenhagen and CESifo

I want to separate the rows at the semicolons (and preferrably at the "and"s) so that I can sort out which academic institutions are unique.

I try doing this by using the separate_rows-function from the tidyr package:

Affiliation<-separate_rows(Affiliation, sep=";")

Alternatively:

Affiliation<-separate_rows(Affiliation, sep="; | and")

None of these methods work, and my data looks exactly the same. What am I doing wrong?

Attaching dput output below:

structure(list(AF = c("MIT", "NBER; NBER", "U MI; Cornell U; U VA", 
"Harvard U; U Chicago", "U OR; U CA, Davis; U British Columbia", 
"World Bank; Dartmouth College; EDHEC Business School; Harvard U", 
"Columbia U and IZA; Columbia U and IZA", "World Bank; Yale U and Abdul Latif Jameel Poverty Action Lab; Dartmouth College", 
"Carnegie Mellon U; Carnegie Mellon U; Carnegie Mellon U", "Columbia U; U CA, San Diego", 
"U CA, Berkeley; McMaster U; McMaster U", "ETH Zurich and CESifo; U Copenhagen and CESifo", 
"U MN, St Paul; Compass Lexecon, Washington, DC; Harvard U", 
"U WI", "U Chicago and IZA; Harvard U; Harvard U")), row.names = c(NA, 
15L), class = "data.frame")
Magnus
  • 728
  • 4
  • 17
  • 2
    Try `result <- Affiliation %>% separate_rows(AF, sep = ";|and")` – Shree Aug 13 '19 at 21:18
  • Right, that seems to work. Are you saying this function will only work when used in a pipeline? – Magnus Aug 13 '19 at 21:19
  • 2
    No, you need to pass the actual column to be separated i.e `AF` in this case. You passed a dataframe but no columns and based on docs - *"... If empty, nothing happens"* – Shree Aug 13 '19 at 21:20

0 Answers0