3

This is an elementary question, but I have been stuck on it for quite some time. I'm trying to group the values in ColumnB but only within each value in ColumnA.

The initial data frame would be something like:

ColumnA = c(1,1,1,2,2,2)
ColumnB = c("f","g","g","f","f","h")
df <- data.frame(ColumnA,ColumnB)
ColumnA    ColumnB
    1         f
    1         g
    1         g
    2         f
    2         f
    2         h

The result would be:

ColumnA    ColumnB
    1         f
    1         g
    2         f
    2         h

(One of the methods I tried using was with dplyr using: group_by(df, ColumnB), but that attempt was unsuccessful.)

ekad
  • 14,436
  • 26
  • 44
  • 46
joat1
  • 53
  • 2
  • 6

3 Answers3

8

The unique function is uniquely suited (no pun intended) to solve your problem:

df <- data.frame(v1=c(1,1,1,2,2,2), v2=c("f", "g", "g", "f", "f", "h"))
df <- unique(df)

> df1
  v1 v2
1  1  f
2  1  g
4  2  f
6  2  h
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
3

You can also try duplicated

df[!duplicated(df),]
#   ColumnA ColumnB
#1       1       f
#2       1       g
#4       2       f
#6       2       h

If needed, this would also give the logical index of rows.

akrun
  • 874,273
  • 37
  • 540
  • 662
2

With dplyr, you'd want to perform an operation after grouping them; the grouping alone does not collapse the rows. You could calculate something with summarise(), pick one row within the group based on a variable, etc. Here's an example with slice() to select the first record within each group combination:

library(dplyr)
df %>%
  group_by(ColumnA, ColumnB) %>%
  slice(1) # select the first row within each group combination

Source: local data frame [4 x 2]
Groups: ColumnA, ColumnB

  ColumnA ColumnB
1       1       f
2       1       g
3       2       f
4       2       h
Sam Firke
  • 21,571
  • 9
  • 87
  • 105
  • Thank you Sam for the response--If more than one row has to be grouped however, is there an efficient method to group each one at once? – joat1 Mar 31 '15 at 13:12
  • Is your question about the grouping variables (columns) or the rows within subgroups? If you want to select a different row or multiple rows within your subgroup, you can edit the last line above; if you want to group by many columns and don't want to type them out, you can pass them as a variable: http://stackoverflow.com/questions/21208801/group-by-multiple-columns-in-dplyr-using-string-vector-input – Sam Firke Mar 31 '15 at 13:18