0

I have a single data column that looks like this for "one record" There can be hundreds if not thousands of records where each record has a different ID but many different attributes ColName2 I want to transpose the table so it looks like table 2 Is this possible in R. It is possible in Rapid-miner but i would like to implement it in R

What I have Table 1

  • ID ColName2
  • 1A Item1
  • 1A Item2
  • 1A Item3
  • 1A Item4
  • 2A Item5

What I want - Table 2

  • ID Item1 Item2 Item3 Item4 Item 5
  • 1A 1 1 1 1 0
  • 2A 0 0 0 0 1

Thanks

John Smith
  • 2,448
  • 7
  • 54
  • 78

1 Answers1

1

You can use reshape2 for this, for example:

> df <- data.frame(ID = c(rep("1A", 4), "2A"), ColName = 1:5)
> df
#  ID ColName
#1 1A       1
#2 1A       2
#3 1A       3
#4 1A       4
#5 2A       5

library(reshape2)

> df2 <- dcast(df, ID ~ ColName, fun.aggregate = any, value.var = "ColName")

The result of this reshapeing is:

  ID     1     2     3     4     5
1 1A  TRUE  TRUE  TRUE  TRUE FALSE
2 2A FALSE FALSE FALSE FALSE  TRUE

So you have logical values (TRUE where you want 1 and FALSE where you want 0). Since you can convert logical values to numeric, where TRUE == 1 and FALSE == 0, you just need to convert all columns (except the first) to numeric. To do this, you can use lapply on the data.frame except the first column (indicated by df2[-1]) and apply the function as.numeric to each of the other columns:

> df2[-1] <- lapply(df2[-1], as.numeric)
> df2
#  ID 1 2 3 4 5
#1 1A 1 1 1 1 0
#2 2A 0 0 0 0 1

lapply is often quite useful if you want to apply a function to all columns of a data.frame or all elements in a list. For some more information check out ?lapply and this question.

Community
  • 1
  • 1
talat
  • 68,970
  • 21
  • 126
  • 157
  • Thanks for the quick response @beginneR. Can you explain to me whats happening on the line df2[-1] <- lapply(df2[-1], as.numeric) – John Smith Aug 19 '14 at 11:44
  • @JohnSmith, I added an explanation. Hope this helps you understand what's happening :) – talat Aug 19 '14 at 11:55
  • perfect thank you very much. I am receiving an error "Error in Summary.factor(integer(0), na.rm = FALSE) : any not meaningful for factors" but i was able to run your code and it has put me in the right direction – John Smith Aug 19 '14 at 12:03
  • @JohnSmith, in that case try `df2 <- dcast(df, ID ~ ColName, value.var = "ColName")` and then `df2[-1] <- lapply(df2[-1], function(x) as.numeric(!is.na(x)))`. Does that work? – talat Aug 19 '14 at 12:08
  • yep...perfectly...Thanks again, without your help i would have been here until christmas – John Smith Aug 19 '14 at 12:34