0

I have a table like this

 id name            value
 1 PROD_N           NA    
 2 PROD_N_NATIONAL  0.789
 3 PROD_N1          0.5    
 4 PROD_N1_NATIONAL 0.2    
 5 PROD_N2          0.4    
 6 PROD_N2_NATIONAL 0.3   
 7 PROD_N3          NA    
 8 PROD_N3_NATIONAL 0.221    
 9 PROD_N4          0.2    
10 PROD_N4_NATIONAL  0.391

I need something like that (in order to make a gg plot grouped bar with fill).

ID year  prod  prod_national
1 2020   NA    0.789
2 2019   0.5   0.2
3 2018   0.4   0.3
4 2017   NA    0.221
5 2016   0.2   0.391

I'm almost sure I should use pivot_wider but I can't manage to achieve that. I always end with

id prod_n prod_n_national prod_n1 prod_n1_national ...

Also, it's another problem but even with the desired table I'm not sure how to be able to plot two 'y' value in the same graph. But I'll see when I'll be there.

Thanks for any tips !

Jujename
  • 27
  • 2

1 Answers1

2

I see two steps that might be handled before transforming the table into a wide format.

  1. The year column has to be created by the number inside the name column
  2. In the name column the numbers has to be removed.

Code

df  %>% 
  mutate(year = 2020 - replace_na(as.numeric(str_extract(name, "\\d")), 0),
         name = str_remove(name, "\\d")) %>% 
  pivot_wider(id_cols = year, 
              names_from = name,
              values_from = value) %>% 
  mutate(ID = row_number())

# A tibble: 5 x 4
   year PROD_N PROD_N_NATIONAL    ID
  <dbl>  <dbl>           <dbl> <int>
1  2020   NA             0.789     1
2  2019    0.5           0.2       2
3  2018    0.4           0.3       3
4  2017   NA             0.221     4
5  2016    0.2           0.39      5

Data

df <- structure(list(id = 1:10, name = structure(1:10, .Label = c("PROD_N", 
"PROD_N_NATIONAL", "PROD_N1", "PROD_N1_NATIONAL", "PROD_N2", 
"PROD_N2_NATIONAL", "PROD_N3", "PROD_N3_NATIONAL", "PROD_N4", 
"PROD_N4_NATIONAL"), class = "factor"), value = c(NA, 0.789, 
0.5, 0.2, 0.4, 0.3, NA, 0.221, 0.2, 0.39)), class = "data.frame", row.names = c(NA, 
-10L))

Plot

Plotting two y variables in ggplot depends on what kind of plot you need.

If you need two different geom_x you can do something like this:

ggplot(df_wide) +
  geom_line(aes(y = PROD_N, x = year)) +
  geom_point(aes(y = PROD_N_NATIONAL, x = year))

If you want the same geom_x but color the names differently, than you might need the long format:

df  %>% 
  mutate(year = 2020 - replace_na(as.numeric(str_extract(name, "\\d")), 0),
         name = str_remove(name, "\\d")) %>%
  ggplot() +
  geom_line(aes(y = value, x = year, color = name)) 
tamtam
  • 3,541
  • 1
  • 7
  • 21
  • Thanks, I just tried but it's weird, my 'prod_N' and 'prod_n_national' are now a list. Would you have an idea why ? – Jujename Mar 07 '21 at 15:58
  • Hey just for the record, thanks to this link https://stackoverflow.com/questions/57937949/pivot-wider-produces-nested-object I understood how to 'unchop' the list from pivot to wide – Jujename Mar 07 '21 at 16:42