0

This is a snippit of my data

Aaron Brook   450 36 25 .124 .530
Arron Afflalo 980 25 40 .128 .597
Arron Afflalo 236 25 94 .223 .621

I'm trying to combine the Arron Afflalo rows but at the same time adding column 1-3 but taking the mean of columns 4-5.

So the final data output looks like

Arron Afflalo 1216 50 134 .176 .609

Is there any quick way to do this. If not, is there a way just to delete all duplicate rows?

markus
  • 25,843
  • 5
  • 39
  • 58
Peter
  • 79
  • 7
  • This is easy to do in `data.table`. `library(data.table) setDT(df) df[,.(sum(col1),sum(col2),sum(col3),mean(col4),mean(col5)), by=name_col]` – Corey Levinson Dec 07 '18 at 21:35
  • 4
    Possible duplicate of [Apply several summary functions on several variables by group in one call](https://stackoverflow.com/questions/12064202/apply-several-summary-functions-on-several-variables-by-group-in-one-call) – markus Dec 07 '18 at 21:39

2 Answers2

1

Using dplyr:

library(dplyr)
df <- data.frame(name = c("Aaron Brook", "Arron Afflalo", "Arron Afflalo"), 
                 x = c(450, 980, 236), y = c(36, 25, 25), 
                 a = c(.124, .128, .223), b = c(.530, .597, .621))

df %>% 
   filter(name == "Arron Afflalo") %>%
   group_by(name) %>%
   summarize(sum_x = sum(x), sum_y = sum(y), mean_a = mean(a), mean_b = mean(b))

If you don't know the names of the columns and just their order, you could also substitute last line with:

   summarize(sum_x = sum(.[[2]]), sum_y = sum(.[[3]]), mean_a = mean(.[[4]]), mean_b = mean(.[[5]]))
godot
  • 1,550
  • 16
  • 33
  • Could this be a little more general? Arron Afflalo was kinda of an example. My data is hundreds of rows long and I would say 15 or so would be duplicates – Peter Dec 07 '18 at 22:28
  • The best would be to give an exemple (or a counter-exemple in this case) but if I understand correctly, you can juste remove the `filter` then you'll have 1 line per name... – godot Dec 07 '18 at 22:31
1

Using tidyverse, if you want a summary for all names and surnames:

df %>%
 group_by(V1, V2) %>%
 mutate_at(3:5, funs(sum(.))) %>%
 mutate_at(6:7, funs(mean(.))) %>%
 summarise_all(funs(first(.)))

  V1    V2         V3    V4    V5    V6    V7
  <fct> <fct>   <int> <int> <int> <dbl> <dbl>
1 Aaron Brook     450    36    25 0.124 0.530
2 Arron Afflalo  1216    50   134 0.176 0.609

Or if you want a summary just for that specific name:

df %>%
 filter(V1 == "Arron" & V2 == "Afflalo") %>%
 mutate_at(3:5, funs(sum(.))) %>%
 mutate_at(6:7, funs(mean(.))) %>%
 summarise_all(funs(first(.)))

     V1      V2   V3 V4  V5     V6    V7
1 Arron Afflalo 1216 50 134 0.1755 0.609

Data:

df <- read.table(text = "Aaron Brook   450 36 25 .124 .530
Arron Afflalo 980 25 40 .128 .597
Arron Afflalo 236 25 94 .223 .621")
tmfmnk
  • 38,881
  • 4
  • 47
  • 67