I have a dataset in which my unit of analysis and variable are together in the same column. I would like to split this up into two variables, but not sure how to because it involves regex.
Say I have the following data:
de <- data.frame(year = c(2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012),
company = c("Company A - Variable 1", "Company A - Variable 2", "Company A - Variable 3",
"CompanyB - Variable 1", "CompanyB - Variable 2", "CompanyB - Variable 3",
"Compan- C - Variable 1", "Compan- C - Variable 2", "Compan- C - Variable 3",
"Company A - Variable 1", "Company A - Variable 2", "Company A - Variable 3",
"CompanyB - Variable 1", "CompanyB - Variable 2", "CompanyB - Variable 3",
"Compan- C - Variable 1", "Compan- C - Variable 2", "Compan- C - Variable 3"),
score = c(1,3,5,7,9,11,13,15,17,2,4,6,8,10,12,14,16,18))
Which looks like:
year company score
<dbl> <chr> <dbl>
2010 Company A - Variable 1 1
2010 Company A - Variable 2 3
2010 Company A - Variable 3 5
2010 CompanyB - Variable 1 7
2010 CompanyB - Variable 2 9
2010 CompanyB - Variable 3 11
2010 Compan- C - Variable 1 13
2010 Compan- C - Variable 2 15
2010 Compan- C - Variable 3 17
2012 Company A - Variable 1 2
2012 Company A - Variable 2 4
2012 Company A - Variable 3 6
2012 CompanyB - Variable 1 8
2012 CompanyB - Variable 2 10
2012 CompanyB - Variable 3 12
2012 Compan- C - Variable 1 14
2012 Compan- C - Variable 2 16
2012 Compan- C - Variable 3 18
The company variable is a combination of company names (in all different forms and shapes), always followed by " - " and then the variable name. I want to wrangle the data so that I arrive at the following structure, splitting up the company and the variable into new columns:
year company variable score
2010 Company A Variable 1 1
2010 Company A Variable 2 3
2010 Company A Variable 3 5
2010 CompanyB Variable 1 7
2010 CompanyB Variable 2 9
2010 CompanyB Variable 3 11
2010 Compan- C Variable 1 13
2010 Compan- C Variable 2 15
2010 Compan- C Variable 3 17
2012 Company A Variable 1 2
2012 Company A Variable 2 4
2012 Company A Variable 3 6
2012 CompanyB Variable 1 8
2012 CompanyB Variable 2 10
2012 CompanyB Variable 3 12
2012 Compan- C Variable 1 14
2012 Compan- C Variable 2 16
2012 Compan- C Variable 3 18
I guess then use something to extract the text before " - " (the company name) and after " - " (the variable name). Then sort of split this up into two different variables: company and variable. Any elegant solution would be much appreciated. Thanks