0

Tried finding a similar post, but couldn't.

I have a column in data table which looks like this ->

x,x,x,x,y,y,y,c,c,c

I want to index in a separate column such that ->

1,1,1,1,2,2,2,3,3,3

How to do it?

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • Real duplicate is here: https://stackoverflow.com/questions/6112803/how-to-create-a-consecutive-index-based-on-a-grouping-variable-in-a-dataframe – Spacedman Aug 31 '17 at 08:02

3 Answers3

2

I'd go with this, which has the advantage of working with data frames and data tables, (and maybe tibbles, idk). The index numbers are obtained from the first appearance of a col code and the output index numbers are not dependent on col codes being adjacent rows (so if col goes x,x,x,x,y,y,y,x,x,x all the x get index 2).

> dt <- data.table(col = c("x", "x", "x", "x", "y", "y", "y", "c", "c", "c")) 
> dt$index = as.numeric(factor(dt$col,levels=unique(dt$col)))
> dt
    col index
 1:   x     1
 2:   x     1
 3:   x     1
 4:   x     1
 5:   y     2
 6:   y     2
 7:   y     2
 8:   c     3
 9:   c     3
10:   c     3
Spacedman
  • 92,590
  • 12
  • 140
  • 224
1

A solution with data.table:

library(data.table)
dt <- data.table(col = c("x", "x", "x", "x", "y", "y", "y", "c", "c", "c")) 

dt[ , idx := .GRP, by = col]
#     col idx
#  1:   x   1
#  2:   x   1
#  3:   x   1
#  4:   x   1
#  5:   y   2
#  6:   y   2
#  7:   y   2
#  8:   c   3
#  9:   c   3
# 10:   c   3

A solution in base R:

dat <- data.frame(col = c("x", "x", "x", "x", "y", "y", "y", "c", "c", "c")) 

dat <- transform(dat, idx = match(col, unique(col)))
#    col idx
# 1    x   1
# 2    x   1
# 3    x   1
# 4    x   1
# 5    y   2
# 6    y   2
# 7    y   2
# 8    c   3
# 9    c   3
# 10   c   3
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
0
dt$index <- cumsum(!duplicated(dt$a))
dt
a index
# 1  x     1
# 2  x     1
# 3  x     1
# 4  x     1
# 5  y     2
# 6  y     2
# 7  y     2
# 8  c     3
# 9  c     3
# 10 c     3
minem
  • 3,640
  • 2
  • 15
  • 29