5

I want to give numbers to each group in a dataframe. For example, I have the following dataframe:

df = data.frame( from = c('a', 'a', 'b'), dest = c('b', 'c', 'd') )
#> df
    #from dest
#1    a    b
#2    a    c
#3    b    d

I want to group by from values and give a group number to each group. This is the expected result:

result = data.frame( from = c('a', 'a', 'b'), dest = c('b', 'c', 'd'), group_no = c(1,1,2) )
#> result
    #from dest group_no
#1    a    b        1
#2    a    c        1
#3    b    d        2

I can solve this problem using a for loop as follows:

groups = df$from %>% unique
i = 0
df$group_no = NA
for ( g in groups ) {
    i = i + 1
    df[ df$from == g, ]$group_no = i
}
#> df
    #from dest group_no
#1    a    b        1
#2    a    c        1
#3    b    d        2

I wonder if it is possible to solve this problem in a more elegant and functional way without using for loops? Specifically, I wonder if this can be done using dplyr::group_by function?

Mert Nuhoglu
  • 9,695
  • 16
  • 79
  • 117

3 Answers3

14

Use mutate to add a column which is just a numeric form of from as a factor:

df %>% mutate(group_no = as.integer(factor(from)))

#   from dest group_no
# 1    a    b        1
# 2    a    c        1
# 3    b    d        2

Note group_by isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by instead of mutate to add the column.

alistaire
  • 42,459
  • 4
  • 77
  • 117
  • 1
    No need to use group_by at all. df %>% mutate(group_no=as.integer(from)) The OP is going to the wrong direction. – Ven Yao Mar 16 '16 at 06:03
  • Was editing to mention that as you commented... – alistaire Mar 16 '16 at 06:04
  • Instead of `factor()`, consider using `fct_inorder()` or another `fct_` function to specify the factor order. `fct_inorder()` maintains the original order of the data frame whereas factor selects however the factor is ordered (alphabetically, perhaps) – Nick Oct 23 '20 at 20:14
  • You can create groups in the order of the vector like `forcats::fct_inorder(x)` with `factor(x, levels = unique(x))` if you like, but that's not part of the question. Generally speaking, row order doesn't mean anything and is often not guaranteed unless you explicitly sort first. If you're creating a factor to have a factor (not to create indices like here), you'll want to specify levels somehow. – alistaire Nov 02 '20 at 06:07
4

We can use group_indices from dplyr

library(dplyr)
df %>% 
   mutate(group_no = group_indices_(., .dots="from"))
#     from dest group_no
#1    a    b        1
#2    a    c        1
#3    b    d        2

A similar option using data.table is

library(data.table)
setDT(df)[, group_no := .GRP, by = from]
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can try transform from the base package

transform(df,group_no=as.numeric(factor(from)))

#   from dest group_no
#1    a    b  1
#2    a    c  1
#3    b    d  2

If the from column is already a factor you can remove the factor() function and use only

transform(df,id=as.numeric(from))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213