3

I have a table that looks like this:

|Condition|Session|Time|
|        A|      1| 100|
|        A|      1| 200|
|        B|      2| 200|
|        B|      2| 300|
|        B|      2| 500|
|        A|      3| 300|
|        A|      4| 200|

I'd like to convert it to the following format:

|   A|   B|   A|   A|
|   1|   2|   3|   4|
| 100| 200| 300| 200|
| 200| 300|    |    |
|    | 500|    |    |

Meaning that the first two rows are "Condition" and "Session", and the rest of the rows represent the "Time" column (variable number of rows).

How can I achieve this in R?

kshahar
  • 10,423
  • 9
  • 49
  • 73
  • i think as a general strategy i would start by trying to reshape the original df into a wide one where the column names are Condiion.Session, e.g. A.1, B.2. A.3,A.4. The I would just covert the colnames into a df with 2 rows and rbind the df to that. – frankc Nov 19 '12 at 19:52
  • Readers might notice how every answerer is including code to transform the sample table into R readable code via read.table or data.frame. Using `dput()` as in `dput(mytable)` and including the result in the question would be a polite way to save them this step. :-) – MattBagg Nov 19 '12 at 21:33
  • @MattBagg you are right, this wasn't very polite – kshahar Nov 23 '12 at 12:26
  • @kshahar, in retrospect I wasn't either. Sorry about that. – MattBagg Nov 23 '12 at 14:54

4 Answers4

1

First of all, every column of a data.frame has the same type. So you may have your desired table in a transposed shape.

Perhaps you can do as follows:

foo = data.frame(Condition=c("A","A","B","B","B","A","A"), 
                 Session=c(1,1,2,2,2,3,4), 
                 Time = c(1,2,2,3,5,3,2)*100)
bar = aggregate(Time~Condition+Session, foo, identity)
bar
#   Condition Session          Time
# 1         A       1      100, 200
# 2         B       2 200, 300, 500
# 3         A       3           300
# 4         A       4           200
bar[1,3]
# $`0`
# [1] 100 200
Ali
  • 9,440
  • 12
  • 62
  • 92
1

Here is one option. A (potentially large) caveat is that I'm using a (very useful) but non-standard custom function called cbind.fill:

> dat <- read.table(text = "|Condition|Session|Time|
+ |        A|      1| 100|
+ |        A|      1| 200|
+ |        B|      2| 200|
+ |        B|      2| 300|
+ |        B|      2| 500|
+ |        A|      3| 300|
+ |        A|      4| 200|",header = TRUE,sep = "|")
dat$X <- dat$X.1 <- NULL

dat$Condition <- factor(dat$Condition,labels = LETTERS[1:2])

tmp <- with(dat,split(Time,list(Condition,Session)))
tmp <- tmp[sapply(tmp,function(x) length(x) > 0)]
res <- do.call(cbind.fill,tmp)

nm <- strsplit(names(tmp),split="\\.")

res <- rbind(as.numeric(sapply(nm,'[',2)),res)
colnames(res) <- sapply(nm,'[',1)
> res
       A   B   A   A
[1,]   1   2   3   4
[2,] 100 200 300 200
[3,] 200 300  NA  NA
[4,]  NA 500  NA  NA

The core idea for cbind.fill can be found in this question. I won't promise identical results, though, as I'm using a generously modified version of the code there.

Community
  • 1
  • 1
joran
  • 169,992
  • 32
  • 429
  • 468
1

I offer another possible solution using ddply from the plyr package and dcast from reshape2:

library(reshape2)
library(plyr)

dat = read.table(text=gsub("\\|", " ", "|Condition|Session|Time|
|        A|      1| 100|
|        A|      1| 200|
|        B|      2| 200|
|        B|      2| 300|
|        B|      2| 500|
|        A|      3| 300|
|        A|      4| 200|"), header=TRUE)

# Add column 'Rank' for each combination of Condition by Session.
dat = ddply(dat, .(Condition, Session), .fun=summarise, 
            Rank=rank(Time), Time=Time)

res = dcast(dat, Condition + Session ~ Rank, value.var="Time")

# Sort by 'Session'.
res = res[order(res$Session), ]

# As @Ali pointed out, you may want to leave the results as
# an un-transposed data.frame.
res

#   Condition Session   1   2   3
# 1         A       1 100 200  NA
# 4         B       2 200 300 500
# 2         A       3 300  NA  NA
# 3         A       4 200  NA  NA

# Transposing will coerce the data.frame to a character matrix.
t(res)

#           1     4     2     3    
# Condition "A"   "B"   "A"   "A"  
# Session   "1"   "2"   "3"   "4"  
# 1         "100" "200" "300" "200"
# 2         "200" "300" NA    NA   
# 3         NA    "500" NA    NA   
bdemarest
  • 14,397
  • 3
  • 53
  • 56
1
 dat <- read.table(text="Condition|Session|Time
 A|      1| 100
 A|      1| 200
 B|      2| 200
 B|      2| 300
 B|      2| 500
 A|      3| 300
 A|      4| 200", header=TRUE,sep="|")
 tapply(dat$Time, paste(dat$Condition, dat$Session, sep="_"), list)
#----------
$A_1
[1] 100 200

$A_3
[1] 300

$A_4
[1] 200

$B_2
[1] 200 300 500
#--------------------
 tdat <-.Last.value
 lmax <- max(sapply(tdat, function(x) length(x)) )
 as.data.frame( lapply(tdat, function(x) c(x, rep(NA, lmax- length(x)) ) ) )
#---------------------
  A_1 A_3 A_4 B_2
1 100 300 200 200
2 200  NA  NA 300
3  NA  NA  NA 500
IRTFM
  • 258,963
  • 21
  • 364
  • 487