0

Hi I'm using R and I have a data like this:

1 2 3 4 5
1 2 1 2 2
3 4 1 2 3
1 2 3 4 5
3 4 1 2 3

I want to number the identical lines together with the same number, for the above ex

1 2 3 4 5 --> 1
1 2 1 2 2 --> 2
3 4 1 2 3 --> 3
1 2 3 4 5 --> 1
3 4 1 2 3 --> 3

Does any know how to do this in R (for both numeric case and character case)?

Your help is really appreciated!

1 Answers1

1

This is your data:

df <- data.frame(a=c(1,1,3,1,3), 
                 b=c(2,2,4,2,4), 
                 c=c(3,1,1,3,1), 
                 d=c(4,2,2,4,2), 
                 e=c(5,2,3,5,3))

Approach 1: You would need the data.table package to perform the below approach:

library(data.table)
i <- interaction(data.table(df), drop=TRUE)
df.out <- cbind(df, id=factor(i,labels=length(unique(i)):1))

This would give you the following:

#  a b c d e  id
#1 1 2 3 4 5   1
#2 1 2 1 2 2   3
#3 3 4 1 2 3   2
#4 1 2 3 4 5   1
#5 3 4 1 2 3   2

Approach 2: Another approach is by using the plyr package, as follows:

library(plyr)
.id <- 0
df.out <- ddply(df, colnames(df), transform, id=(.id<<-.id+1))    

This will give you the following output:

#  a b c d e  id
#1 1 2 1 2 2   1
#2 1 2 3 4 5   2
#3 1 2 3 4 5   2
#4 3 4 1 2 3   3
#5 3 4 1 2 3   3

Hope it helps.

Taher A. Ghaleb
  • 5,120
  • 5
  • 31
  • 44