3

I am quite new to the whole programing stuff, but i need to skript reproducable for large datasets. I hope I provided a sufficient example.

I have a dataframe like this (with 8 more "Nutrients" and 5 more "trade-elements" and much more Years):

Year<-c(1961,1962)
Total_Energy_kcal_Production<-c(5,8)
Total_Energy_kcal_Import<-c(6,1)
Total_Ca_g_Production<-c(3,4)
Total_Ca_g_Import<-c(3,8)
df<-cbind(Year,Total_Energy_kcal_Production, Total_Energy_kcal_Import, Total_Ca_g_Production, Total_Ca_g_Import)

looks like:

Year  Total_Energy_kcal_Production   Total_Energy_kcal_Import   Total_Ca_g_Production    Total_Ca_g_Import 
1961   5                              6                          3                       3
1962   8                              1                          4                       8

and I want it to look like this:

Year  Nutrient            Production        Import
1961  Total_Energy_kcal   5                 6
1962  Total_Energy_kcal   8                 1
1961  Total_Ca_g          3                 3 
1962  Total_Ca_g          4                 8

I tried a lot with pivot_longer and names_patern. I thought this would work, but I do not fully understand the arguments:

df_piv<-df%>%
  pivot_longer(cols = -Year, names_to = "Nutrient", 
              names_pattern = ".*(?=_)")

I get an error-message that i can not interprete:

Error: Can't select within an unnamed vector.
Annika
  • 49
  • 7

1 Answers1

8

You can provide names_pattern regex as :

tidyr::pivot_longer(df, 
                    cols = -Year, 
                    names_to = c('Nutrient', '.value'),
                    names_pattern = '(.*)_(\\w+)')

#   Year Nutrient          Production Import
#  <dbl> <chr>                  <dbl>  <dbl>
#1  1961 Total_Energy_kcal          5      6
#2  1961 Total_Ca_g                 3      3
#3  1962 Total_Energy_kcal          8      1
#4  1962 Total_Ca_g                 4      8

This will put everything until the last underscore in Nutrient column and the remaining data is kept as column name.

data

cbind will create a matrix, use data.frame to create data.

df<-data.frame(Year,Total_Energy_kcal_Production,Total_Energy_kcal_Import, 
               Total_Ca_g_Production, Total_Ca_g_Import)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks a lot! The cbind caused the problem when I tried this before. Thanks a lot! – Annika Feb 22 '21 at 11:41
  • @Ronak shah, Could you explain '\\w+'? I also tried df3 %>% pivot_longer(cols = Total_Energy_kcal_Production:Total_Ca_g_Import , names_to = c('Nutrient','.value'), names_pattern='(..*)_(.*)' ) and it work! But I don not know why? – Masoud Sep 16 '21 at 19:06
  • It depends on the column names that we have. Here, there is only one word which we want as column name (`Production` and `Import`) so `\\w+` works. Since regex is greedy your option with `(.*)_(.*)` would also work. – Ronak Shah Sep 17 '21 at 01:27