1

I have a data.frame with a layout like this:

Data =    Id somevalue
          1   ab
          1   cd
          1   i
          2   o
          2   j

And I want to get index it by the Id such that i get the following:

Data =    Id somevalue index
          1   ab        1
          1   cd        2
          1   i         3
          2   o         1
          2   j         2

The way I do it now is with

for(ID in search_IDs)
{
   Data[Data[,1]==ID,]$index<-1:length(Data[DataGuess[,1]==ID,1])   
}

or more r like:

Data<-as.data.frame(sapply(Ids,FUN=(function(x,y)y[y[,1]==x,]$index<-1:length(y[y[,1]==x,1])),y=Data))

However both take a long time to finish and I was wondering if there was a faster way to make this work.

R.vW
  • 177
  • 1
  • 11

2 Answers2

1

Base R:

x1 <- do.call(
  rbind.data.frame,
  by(x, x$Id, function(df) { df$index <- seq_len(nrow(df)); df; })
)
x1
#     Id somevalue index
# 1.1  1        ab     1
# 1.2  1        cd     2
# 1.3  1         i     3
# 2.4  2         o     1
# 2.5  2         j     2

Using dplyr:

library(dplyr)
x2 <- x %>%
  group_by(Id) %>%
  mutate(index = row_number()) %>%
  ungroup()
x2
# # A tibble: 5 x 3
#      Id somevalue index
#   <int> <chr>     <int>
# 1     1 ab            1
# 2     1 cd            2
# 3     1 i             3
# 4     2 o             1
# 5     2 j             2

Your data:

x <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
Id somevalue
1   ab
1   cd
1   i
2   o
2   j')
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks, from 1.4 min to 0.7 seconds is quite the improvement – R.vW May 17 '18 at 20:20
  • I've seen better -- one problem decreased from [1 week runtime](https://stackoverflow.com/q/48627757/3358272) to [4 seconds](https://stackoverflow.com/questions/48627757/optimization-of-iteration-in-r/48629299#comment84424587_48629299). :-) – r2evans May 17 '18 at 20:30
0
library(tidyverse)
d <- tibble(
  id = c(1, 1, 1, 2, 2),
  somevalue = c('ab', 'cd', 'i', 'o', 'j')
)

d %>% 
  group_by(id) %>%
  mutate(index = 1) %>%
  mutate(index = cumsum(index))
#> # A tibble: 5 x 3
#> # Groups:   id [2]
#>      id somevalue index
#>   <dbl> <chr>     <dbl>
#> 1     1 ab            1
#> 2     1 cd            2
#> 3     1 i             3
#> 4     2 o             1
#> 5     2 j             2
tmastny
  • 417
  • 6
  • 11