1

Please let me know 'R code' that could rearrange data from

AA 100 NA
BB 200 300
CC 300 NA
DD 100 400

to

AA 100 0   0   0
BB 0   200 300 0
CC 0   0   300 0
DD 100 0   0   400

OR

   100 200 300 400
AA 1   0   0   0
BB 0   1   1   0
CC 0   0   1   0
DD 1   0   0   1
BenBarnes
  • 19,114
  • 6
  • 56
  • 74
임상혁
  • 11
  • 2
  • 3
    It would be helpful to have a [reproducible example](http://stackoverflow.com/q/5963269) and to know what you've tried so far. – BenBarnes Dec 11 '12 at 06:43
  • 1
    If your data are in a data.frame named `df`, then `table(data.frame(df[,1], unlist(df[,-1])))` will do the trick. – Josh O'Brien Dec 11 '12 at 06:46
  • 1
    @JoshO'Brien make that an answer? myDF <- data.frame( V1 = c("AA", "BB", "CC", "DD") , V2 = c(100L, 200L, 300L, 100L) , V3 = c(NA, 300L, NA, 400L) ) table(data.frame(myDF[,1], unlist(myDF[,-1]))) – Anthony Damico Dec 11 '12 at 07:22
  • @JoshO'Brien, I like your method a lot more. Please go ahead and post that as an answer – Ricardo Saporta Dec 11 '12 at 07:24
  • @AnthonyDamico and Ricardo -- OK, just posted it. Feel free to edit/add explanation to my answer if you like -- I thought it a bit opaque without any explanation, and was feeling a bit lazy myself... – Josh O'Brien Dec 11 '12 at 07:34
  • Thank you for your answer. I solved problem. I'm not good at R code now. But I'll try it. – 임상혁 Dec 11 '12 at 08:14

3 Answers3

6
df <- read.table(text = "AA 100 NA
BB 200 300
CC 300 NA
DD 100 400")

table(data.frame(letters = df[,1], numbers = unlist(df[,-1])))
#        numbers
# letters 100 200 300 400
#      AA   1   0   0   0
#      BB   0   1   1   0
#      CC   0   0   1   0
#      DD   1   0   0   1
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
3
# SAMPLE DATA
myDF <- structure(list(V2 = c(100L, 200L, 300L, 100L), V3 = c(NA, 300L, NA, 400L)), .Names = c("V2", "V3"), class = "data.frame", row.names = c("AA", "BB", "CC", "DD"))

Assuming myDf is your original data frame

# create columns sequence
Columns <- seq(100, 400, by=100)

newMat <- sapply(Columns, function(c) rowSums(c==myDF, na.rm=T))

# assign names
colnames(newMat) <- Columns

newMat  
#      100 200 300 400
#   AA   1   0   0   0
#   BB   0   1   1   0
#   CC   0   0   1   0
#   DD   1   0   0   1


Explanation:

c == myDF gives a matrix of TRUE/FALSE values.
If you perform arithmetic on T/F, they are treated as 1/0
Thus, we can take the rowSum() for each row AA, BB, etc.
which will tell us how many times each row is equal to c.

We use sapply to iterate over each column value, 100, 200, etc.
lapply returns for us a list
sapply, takes that list and simplifies it into a nice matrix.

we then clean up the names to make things pretty.

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
0

To get the values, one could also use the reshape2 package:

DF <- read.table(text = "AA 100 NA
 BB 200 300
 CC 300 NA
 DD 100 400")

library(reshape2)
dfm <- melt(DF, id = "V1")

dcast(dfm, V1 ~ factor(value), fill = 0)[, -6]
  V1 100 200 300 400
1 AA 100   0   0   0
2 BB   0 200 300   0
3 CC   0   0 300   0
4 DD 100   0   0 400

The last column in dcast() is removed because NA is a value in dfm$value and takes up the last column in the cast data frame.

Dennis
  • 732
  • 4
  • 4