-3

I have this data frame:

df<- data.frame(j = c("a", "a", "b", "b", "c", "c"), 
                t = c(2000,2010,2000,2010,2000,2010))
> df
        j    t
1       a 2000
2       a 2010
3       b 2000
4       b 2010
5       c 2000
6       c 2010

I am trying to create an indicator jt to identify the country/year:

        j    t  jt
1       a 2000  1
2       a 2010  2
3       b 2000  3
4       b 2010  4
5       c 2000  5
6       c 2010  6
7       c 2010  6
8       c 2010  6

The last two observations signal that I can have multiple occurrences.

000andy8484
  • 563
  • 3
  • 16

2 Answers2

3
df<- data.frame(j = c("a", "a", "b", "b", "c", "c", "c", "c"), 
                t = c(2000,2010,2000,2010,2000,2010,2010,2010))
df$jt <- paste(df$j, df$t, sep="")
df$jt <- as.factor(df$jt)
str(df)

That makes jt a factor with levels for each unique combination. If you really want them to be numeric, you can coerce them using the factor levels:

df$jt <- as.numeric(factor(df$jt, levels = unique(df$jt)))
df
Twitch_City
  • 1,236
  • 1
  • 10
  • 22
2

We can use paste the rows of 'df', and get the numeric index by matching 'v1' with the unique values.

 v1 <- do.call(paste0, df)
 df$jt <- match(v1, unique(v1))

Or we can try with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'j' and 't', we get the .GRP and assign (:=) it to 'jt'.

library(data.table)
setDT(df)[, jt := .GRP ,.(j, t)]
df
#   j    t jt
#1: a 2000  1
#2: a 2010  2
#3: b 2000  3
#4: b 2010  4
#5: c 2000  5
#6: c 2010  6
#7: c 2010  6
#8: c 2010  6
akrun
  • 874,273
  • 37
  • 540
  • 662