0

I have a data-frame like this:

ID   A   B   C   D
1    1   0   x1  y
1    0   0   x1  y
1    0   0   x1  y
2    0   0   x6  v
2    0   1   x6  v
2    0   0   x6  v
3    0   1   r5  t
3    1   0   r5  t
3    0   0   r5  t

column ID is unique and I want to remove all duplicated and aggregate column A and B by showing max or sum (there could be maximum only one '1' for each ID in column A or B) and display columns C and D that has same value for all unique IDs.

ID   A   B   C   D
1    1   0   x1  y
2    0   1   x6  v
3    1   1   r5  t

by using

aggregate(A~ID,data=df,FUN=sum)

I get only ID and column A but how could I aggregate A and B in the same time and also show columns C and D

Sajjad
  • 67
  • 2
  • 8

1 Answers1

2

You can use . to aggregate both columns A and B and specify C and D as group variables if you want them to show in the results:

aggregate(. ~ ID + C + D,data=df,FUN=sum)
#   ID  C D A B
# 1  3 r5 t 1 1
# 2  2 x6 v 0 1
# 3  1 x1 y 1 0

Another trick is to use cbind to explicitly specify which columns you want to aggregate:

aggregate(cbind(A, B) ~ ID + C + D,data=df,FUN=sum)

which gives the same result.

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • thanks. I have more than 70 columns like C and D to display, Is there any way to avoid typing 70 names? – Sajjad Jul 26 '16 at 14:19
  • All the 70 columns are unique for each ID? Could try something like: `aggregate(cbind(A, B) ~ .,data=df,FUN=sum)` – Psidom Jul 26 '16 at 14:20
  • They all have same values for each unique ID like column C and D. – Sajjad Jul 26 '16 at 14:23
  • See the above comments. On the other hand, I would possibly prefer to extract those columns out as a separate data frame. aggregate on A, B and ID and then join back on ID, which might be more efficient. – Psidom Jul 26 '16 at 14:25