How do you group values in one column for each unique value in another R?

Question

This is an elementary question, but I have been stuck on it for quite some time. I'm trying to group the values in ColumnB but only within each value in ColumnA.

The initial data frame would be something like:

ColumnA = c(1,1,1,2,2,2)
ColumnB = c("f","g","g","f","f","h")
df <- data.frame(ColumnA,ColumnB)

ColumnA    ColumnB
    1         f
    1         g
    1         g
    2         f
    2         f
    2         h

The result would be:

ColumnA    ColumnB
    1         f
    1         g
    2         f
    2         h

(One of the methods I tried using was with dplyr using: group_by(df, ColumnB), but that attempt was unsuccessful.)

Tim Biegeleisen · Answer 1 · 2015-03-31T13:11:08.563

8

The unique function is uniquely suited (no pun intended) to solve your problem:

df <- data.frame(v1=c(1,1,1,2,2,2), v2=c("f", "g", "g", "f", "f", "h"))
df <- unique(df)

> df1
  v1 v2
1  1  f
2  1  g
4  2  f
6  2  h

edited Mar 31 '15 at 13:11

answered Mar 31 '15 at 12:54

Tim Biegeleisen

502,043
27
286
360

score 3 · Accepted Answer · answered Mar 31 '15 at 13:09

3

You can also try duplicated

df[!duplicated(df),]
#   ColumnA ColumnB
#1       1       f
#2       1       g
#4       2       f
#6       2       h

If needed, this would also give the logical index of rows.

answered Mar 31 '15 at 13:09

akrun

874,273
37
540
662

score 2 · Answer 3 · answered Mar 31 '15 at 13:05

2

With dplyr, you'd want to perform an operation after grouping them; the grouping alone does not collapse the rows. You could calculate something with summarise(), pick one row within the group based on a variable, etc. Here's an example with slice() to select the first record within each group combination:

library(dplyr)
df %>%
  group_by(ColumnA, ColumnB) %>%
  slice(1) # select the first row within each group combination

Source: local data frame [4 x 2]
Groups: ColumnA, ColumnB

  ColumnA ColumnB
1       1       f
2       1       g
3       2       f
4       2       h

answered Mar 31 '15 at 13:05

Sam Firke

21,571
9
87
105

Thank you Sam for the response--If more than one row has to be grouped however, is there an efficient method to group each one at once? – joat1 Mar 31 '15 at 13:12
Is your question about the grouping variables (columns) or the rows within subgroups? If you want to select a different row or multiple rows within your subgroup, you can edit the last line above; if you want to group by many columns and don't want to type them out, you can pass them as a variable: http://stackoverflow.com/questions/21208801/group-by-multiple-columns-in-dplyr-using-string-vector-input – Sam Firke Mar 31 '15 at 13:18

How do you group values in one column for each unique value in another R?

3 Answers3