1

I'm currently stuck with below problem, scenario:

I have two columns who together serves as a "Primary key, and column with values that i want to sum. Whats important, I want summed values to be copied over all records with the same "Primary key"

So it should go something like this:

Col1 Col2 Col3 la01
A1   B1   EPP  1
A1   B2   EPP  1
A1   B1   EPQ  2

Trasnofrms into:

Col1 Col2 Col3 la01
A1   B1   EPP  3
A1   B2   EPP  1
A1   B1   EPQ  3

I'he had some success using group_by with one of the summarise versions, but the best I've got was results of the sum split by number of records summed, but i need it copied.

Looking forward for your thoughts and answers.

Doniu
  • 85
  • 1
  • 10

1 Answers1

2

You should use mutate() instead of summarise() when your dataframe is grouped. It enables you to keep the same dimension of the dataframe and to not summarise some rows.

df <- data.frame(Col1 = rep('A1', 3), 
                 Col2 = c('B1', 'B2', 'B1'), 
                 Col3 = c('EPP', 'EPP', 'EPQ'), 
                 la01 = c(1,1,2))

df %>% 
   group_by(Col1, Col2) %>% 
   mutate(la01 = sum(la01)) %>% 
   ungroup()

# A tibble: 3 x 4
    Col1   Col2   Col3  la01
  <fctr> <fctr> <fctr> <dbl>
1     A1     B1    EPP     3
2     A1     B2    EPP     1
3     A1     B1    EPQ     3
demarsylvain
  • 2,103
  • 2
  • 14
  • 33
  • 1
    p.s. you can copy and paste tables into `fread` e.g. `df <- fread('[paste table here]')` to avoid recreating the df by hand. – IceCreamToucan Feb 08 '18 at 14:32
  • Works great thank you! I guess i got lost in all the options R had. – Doniu Feb 08 '18 at 14:56
  • @demarsylvain One more question, why do you add `ungroup()` at the end? – Doniu Feb 09 '18 at 08:09
  • if you let the table grouped, it will impact all the next operations on the table. For instance it's impossible to remove or modify the column Col1, because it's a grouping variable. If this is what you want, we can let the table grouped, but keep it in mind. – demarsylvain Feb 09 '18 at 14:14
  • I've tried this solution with way more information and it seems that it is collecting the value of all records and not grouping by a single value of a row. The only difference is that instead of using 2 columns on the grou_by function, I have only used 1. Will try using 2 columns ee if that changes the outcome. – Zombraz May 07 '19 at 21:03