2

I have question about numbering the groups in a data.frame.

I found only one similar approach here dplyr-how-to-number-label-data-table-by-group-number-from-group-by

but it didnt worked to me. I dont know why.

S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)

get_next_integer = function(){
  i = 0
  function(S){ i <<- i+1 }
}
get_integer = get_next_integer() 

result <- df %>% group_by(S) %>% mutate(label = get_integer())
result

Source: local data frame [72 x 3]
Groups: S [12]

       R      S label
   (int) (fctr) (dbl)
1   5058      a     1
2   5121      a     1
3   5129      a     1
4   5143      a     1
5   5202      a     1
6   5213      a     1
7   5239      b     1
8   5245      b     1
9   5269      b     1
10  5324      b     1
..   ...    ...   ...

I look for elegant solution in dplyr. Numbering each letters from 1 to 12 etc.

Community
  • 1
  • 1
Alexander
  • 4,527
  • 5
  • 51
  • 98
  • 1
    Is there a reason to do this in `dplyr`? `df$label <- as.numeric(factor(df$S))` – hrbrmstr Nov 11 '15 at 02:53
  • 1
    @Frank, how is `df$label <- group_indices(df, S)` useless? – hrbrmstr Nov 11 '15 at 02:55
  • actually, that's not the whole point of the package. chaining is a nice additional component but the whole point of the pkg was to provide a more standardized and sane way of doing data frame machinations. – hrbrmstr Nov 11 '15 at 02:57
  • @hrbrmstr Fair enough. Cleaning up my comments. One other common way: `match(df$S, unique(df$S))` – Frank Nov 11 '15 at 03:05
  • 1
    @hrbrmstr - can you explain why `df %>% group_indices(S)` works fine but `df %>% mutate(label=group_indices(S))` fails? I can't for the life of me figure why it should just not work. – thelatemail Nov 11 '15 at 03:38
  • @thelatemail it has no idea about the `.data` element as it's not built that way (ref: https://github.com/hadley/dplyr/issues/1185) and it doesn't look like it's on the menu to be supported in-situ any time soon (which is strange). It does seem like an odd-duck function in the dplyr family. – hrbrmstr Nov 11 '15 at 11:21

2 Answers2

6

Using as.numeric will do the trick.

S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)

result <- df %>% mutate(label = as.numeric(S)) %>% group_by(S)

result
Source: local data frame [72 x 3]
Groups: S

      R S label
1  5018 a     1
2  5042 a     1
3  5055 a     1
4  5066 a     1
5  5081 a     1
6  5133 a     1
7  5149 b     2
8  5191 b     2
9  5197 b     2
10 5248 b     2
..  ... .   ...
N311V
  • 196
  • 5
  • 1
    Why not just `df %>% mutate(label=as.numeric(S))` - there's no need to group when you're working on the group variable itself. – thelatemail Nov 11 '15 at 03:39
  • @thelatemail Yeah, I was aware that it wasn't necessary but assumed the OP wanted it grouped based off the example code given, request for a dplyr solution, and use of the group-by tag. I'm not sure whether or not to take that bit out... – N311V Nov 11 '15 at 03:48
4

No need to use dplyr at all.

S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)

df$label <- as.numeric(factor(df$S))
Ven Yao
  • 3,680
  • 2
  • 27
  • 42