-3

I have a table where I have a variable Technology, which includes "AllRenewables", "Biomass","Solar","Offshore wind", "Onshore wind" and "Wind".

I would like that the "All Renewables" is split into "Biomass","Solar","Offshore wind", "Onshore wind" and that "Wind" technology should be split into ""Offshore wind", "Onshore wind".

The table looks approximately as follows:

Table

 Year   Country   Technology       Changes
  2000   A       Solar             1
  2000   A       Wind              2
  2000   A       Onshore wind      2
  2000   A       All Renewables    3

It should look as follows after the re-structuring:

Table

  Year   Country    Technology       Changes
  2000     A           Solar           1
  2000     A           Onshore wind    2
  2000     A           Offshore wind   2
  2000     A           Onshore wind    3
  2000     A           Biomass         3
  2000     A           Solar           3
  2000     A           Onshore wind    3
  2000     A           Offshore wind   3

If anybody could help, I would be really really thankful. Sarah

Community
  • 1
  • 1
Sarah8888
  • 23
  • 1
  • 5
  • 1
    Please provide sample data in a copy&paste-able format (e.g. using `dput`). It might be useful to take a look at how to provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Maurits Evers Aug 06 '18 at 08:51
  • Sorry for this, I will in the future. Thank you. – Sarah8888 Aug 07 '18 at 12:26

2 Answers2

1

You could rename factor levels and use tidyr::separate_rows

lvls <- c(
    "Biomass, Solar, Offshore wind, Onshore wind",
    "Onshore wind",
    "Solar",
    "Offshore wind, Onshore wind")
levels(df$Technology) <- lvls;

library(tidyverse)
df %>% separate_rows(Technology, sep = ", ") %>%
    group_by_all() %>%
    slice(1) %>%
    ungroup() %>%
    arrange(Changes)
## A tibble: 7 x 4
#   Year Country Technology    Changes
#  <int> <fct>   <chr>           <int>
#1  2000 A       Solar               1
#2  2000 A       Offshore wind       2
#3  2000 A       Onshore wind        2
#4  2000 A       Biomass             3
#5  2000 A       Offshore wind       3
#6  2000 A       Onshore wind        3
#7  2000 A       Solar               3

Explanation: We redefine factor levels such that "All Renewables" becomes "Biomass, Solar, Offshore wind, Onshore wind" and "Wind" becomes "Offshore wind, Onshore wind". Then we use tidyr::separate_rows to split entries with a comma into separate rows. All that remains are removal of duplicates and re-ordering of rows.


Sample data

df <- read.table(text =
    "Year  Country  Technology  Changes
2000  A  'Solar'  1
2000  A  'Wind'  2
2000  A  'Onshore wind'  2
2000  A  'All Renewables'  3", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 1
    Thank you! So, sorry, I am still not 100% sure, how exactly can I make that the renewables are split into the different technologies (and wind into offshore and onshore)? this is my major issue :( thank you again! – Sarah8888 Aug 06 '18 at 09:13
  • 1
    @Sarah8888 That's exactly what I'm showing in my solution. I've added an explanation to clarify. Please take a look. – Maurits Evers Aug 06 '18 at 09:17
0

Just a question of merging (with tidyverse) :

# Your data:
df <- read.csv(textConnection("Y, A, B, C
2000,A,Solar,1
2000,A,Wind,2
2000,A,Onshore wind,2
2000,A,All Renewables,3"),stringsAsFactors=FALSE)
# Your synonyms: 
c <- read.csv(textConnection("B, D
All Renewables,Biomass
All Renewables,Solar
All Renewables,Offshore wind
All Renewables,Onshore wind
Wind,Offshore wind
Wind,Onshore wind"),stringsAsFactors=FALSE)

df %>% left_join(c,by="B") %>% mutate(B=coalesce(D,B)) %>% select(-D)
#     Y A              B C
#1 2000 A          Solar 1
#2 2000 A  Offshore wind 2
#3 2000 A   Onshore wind 2
#4 2000 A   Onshore wind 2
#5 2000 A        Biomass 3
#6 2000 A          Solar 3
#7 2000 A  Offshore wind 3
#8 2000 A   Onshore wind 3
Nicolas2
  • 2,170
  • 1
  • 6
  • 15