Reorganizing data from 3 rows to 1

Question

I need to reorganize data from a csv file that contains mostly repeating data. I have the data imported into R in a dataframe but I am having trouble with the following:

ID   Language  Author   Keyword
12   eng       Rob      COLOR=Red
12   eng       Rob      SIZE=Large
12   eng       Rob      DD=1
15   eng       John     COLOR=Red
15   eng       John     SIZE=Medium
15   eng       John     DD=2

What I need to do is transform this into a row with each keyword in a separate column

ID   Language  Author  COLOR  SIZE      DD
12   eng       Rob     Red    Large     1

Any ideas?

score 7 · Accepted Answer · edited May 23 '17 at 11:56

7

Using the reshape2 package this is straightforward:

With tt defined as in Gary's answer

library("reshape2")

tt <- cbind(tt, colsplit(tt$Keyword, "=", c("Name", "Value")))
tt_new <- dcast(tt, ID + Language + Author ~ Name, value.var="Value")

which gives

> tt_new
  ID Language Author COLOR DD   SIZE
1 12      eng    Rob   Red  1  Large
2 15      eng   John   Red  2 Medium

edited May 23 '17 at 11:56

Community

1
1

answered Feb 22 '13 at 20:36

Brian Diggs

57,757
13
166
188

I am able to get this to work but after the dcast() runs I get an error saying: Aggregation function missing: defaulting to length. I then get a 1 in each of the new variables. I guess in the real dataset there must be something that is not right as it works fine on this simplified data set. Any ideas? – Ray Feb 25 '13 at 15:30
It seems I have unequal numbers somewhere. Need to figure out how to find the ID with more than 3 rows and clean it up. – Ray Feb 25 '13 at 15:52
You would get that warning if there is a combination of `ID`, `Language`, `Author`, and `Name` that appears more than once. You can tell which one it is/ones they are by looking for a value other than 1 in the result (essentially, when there are multiple values for a combination, reshape will aggregate them into a single value somehow, and by default, that somehow is just counting how many there are). – Brian Diggs Feb 25 '13 at 16:44

score 6 · Answer 2 · edited Feb 23 '13 at 02:22

6

Using plyr ans strsplit you can do something like this :

library(plyr)
res <- ddply(dat,.(ID,Language,Author),function(x){
        unlist(sapply(strsplit(x$Keyword,'='),'[',2))
})

colnames(res)[4:6] <- c('COLOR','SIZE','DD')

 ID Language Author COLOR   SIZE DD
1 12      eng    Rob   Red  Large  1
2 15      eng   John   Red Medium  2

Edit: Here is a generalization that addresses @Brian's concern:

res <- ddply(dat,.(ID,Language,Author), function(x){
             kv <- strsplit(x$Keyword, '=')
             setNames(sapply(kv, `[`, 2),
                      sapply(kv, `[`, 1)) })

edited Feb 23 '13 at 02:22

flodel

87,577
21
185
223

answered Feb 22 '13 at 20:11

agstudy

119,832
17
199
261

2

As a point of caution, this does assume that the keywords are always `COLOR`, `SIZE` and `DD` and always in that order. If that assumption is true, this works fine. – Brian Diggs Feb 22 '13 at 20:42

Gary Weissman · Answer 3 · 2013-02-22T20:43:54.250

1

Try this using reshape2:

tt <- read.table(header=T,text='ID   Language  Author   Keyword
 12   eng       Rob      COLOR=Red
 12   eng       Rob      SIZE=Large
 12   eng       Rob      DD=1
 15   eng       John     COLOR=Red
 15   eng       John     SIZE=Medium
 15   eng       John     DD=2')

tt$Keyword <- as.character(tt$Keyword)

tt <- transform(tt, key_val = lapply(tt$Keyword,function(x) strsplit(x,'=')[[1]][2]),
 key_var = lapply(tt$Keyword,function(x) strsplit(x,'=')[[1]][1]))

tt_new <- dcast (tt, ID + Language + Author ~ key_var, value.var='key_val')

edited Feb 22 '13 at 20:43

answered Feb 22 '13 at 20:18

Gary Weissman

3,557
1
18
23

Your `lapply`'s should be `sapply`'s and you need a `value.var="key_val"` in your `dcast` call. – Brian Diggs Feb 22 '13 at 20:40
Didn't realize `value.var` needed the var in quotes, so it gave me an error. I left it out but `dcast` guessed the right variable appropriately in this case. Thanks. – Gary Weissman Feb 22 '13 at 20:43

Reorganizing data from 3 rows to 1

3 Answers3

Linked