0

I have a dataframe data that looks like this:

> data
     id var1 var2
1  1000   32  2.3
2  1000   34  2.5
3  1000   33   NA
4  1000   36  2.4
5  1001   32  3.1
6  1001   NA  2.5
7  1001   45   NA
8  1002   45  2.6
9  1002   37   NA
10 1002   33  3.1
11 1002   NA  3.3

As you can see, each ID has multiple observations (3-4 each). I want to add another variable (column), which acts like an index and numbers each observation within the ID. This is ideally what the dataframe would look like after adding the variable:

> data_goal
     id var1 var2 index
1  1000   32  2.3     1
2  1000   34  2.5     2
3  1000   33   NA     3
4  1000   36  2.4     4
5  1001   32  3.1     1
6  1001   NA  2.5     2
7  1001   45   NA     3
8  1002   45  2.6     1
9  1002   37   NA     2
10 1002   33  3.1     3
11 1002   NA  3.3     4

What would be the best way to do this in R? If it's relevant, my ultimate goal is to reshape the data into "wide" format for further analyses, but for that I need an index variable.

user9264809
  • 25
  • 2
  • 4
  • possible dupe, https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame – tyluRp Jan 25 '18 at 21:46

2 Answers2

0
library(data.table)
 setDT(dat)[,index:=seq(1,.N),by=id]
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Here is a solution that uses dplyr:

# reproducing your data
data<- data.frame(rbind(c(1,1000,32,2.3),c(2,1000,34,2.5),c(3,1000,33,NA),
        c(4,1000,36,2.4),c(5,1001,32,3.1),c(6,1001,NA,2.5),c(7,1001,45,NA), 
        c(8,1002,45,2.6),c(9,1002,37,NA),c(10,1002,33,3.1),
        c(11,1002,NA,3.3)))

colnames(data)<-c("row", "id","var1","var2")

library(dplyr)

# use pipes ( %>% ) to do this in a single line of code
data_goal<-data %>% group_by(id) %>% mutate(index=1:n())

You can easily use dplyr to reshape the data too. Here is a resource if you are unfamiliar: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

mshea
  • 41
  • 5