Use spread with three columns where first column has state and second transition state and values in third

Question

I have an input like this .

type1   type2   2
type1   type3   4
type1   type5   3
type2   type1   6
type2   type4   2
type2   type3   3
type3   type1   2
type3   type2   2
type3   type4   4
type4   type1   7
type4   type2   1
type4   type3   4
type5   type1   2
type5   type3   3
type5   type4   1

Here column1 is suppose one state and column two is second state and the third column has values corresponding to this transition. So I want it to be spread such that I have unique states in column1 and the rest of the columns are name with all unique column names and each cell in rows should contain the counts from third column.

So the output should look like this.

types   type1   type2   type3   type4   type5
type1   0       2       4       0       3
type2   6       0       3       2       0
type3   2       2       0       2       0
type4   7       1       4       0       0
type5   2       0       3       1       0

I have tried this which gives error : Duplicate identifiers for rows (7, 8). I don't know how to use spread in this case.

seq=read.csv("test.txt",header=FALSE,sep="\t")
colnames(seq) = c("state1","state2","counts")
seqs=spread(data=seq,state1,state2,fill=0)

Any help is appreciated.

You almost had it. Try `spread(seq, state2, counts, fill = 0)` — talat, Sep 15 '15 at 12:17
@docendo discimus . I had tried it and updated my post earlier but it is errored as it didn't work for a different and larger data. — Akshay Hazari, Sep 15 '15 at 13:38

score 2 · Answer 1 · answered Sep 15 '15 at 12:10

With reshape2 the solution is straightforward:

dcast(df, V1 ~ V2, fill=0)
#     V1 type1 type2 type3 type4 type5
#1 type1     0     2     4     0     3
#2 type2     6     0     3     2     0
#3 type3     2     2     0     4     0
#4 type4     7     1     4     0     0
#5 type5     2     0     3     1     0

score 1 · Accepted Answer · answered Sep 15 '15 at 11:55

1

I think that the code below will solve this for you:

> grid <- with(d, expand.grid(V1=levels(V1), V2=levels(V2)))
> d2 <- merge(d, grid, all.y=TRUE)
> l <- split(d2[c("V2", "V3")], d2$V1)
> t(sapply(l, function(x) { ret <- x$V3; names(ret) <- x$V2; ret}))
      type1 type2 type3 type4 type5
type1    NA     2     4    NA     3
type2     6    NA     3     2    NA
type3     2     2    NA     4    NA
type4     7     1     4    NA    NA
type5     2    NA     3     1    NA

Might not be the nicest looking code to do this, but it was what came to mind.

answered Sep 15 '15 at 11:55

DGKarlsson

1,091
12
18

That works. What I came up with is errored . It didnt work when I added a lot more data. – Akshay Hazari Sep 15 '15 at 12:05
While this works, I wouldn't recommend it considering that there are so much easier solutions to this common reshaping problem. – talat Sep 15 '15 at 12:21
I agree, dcast() is most likely a better solution in this case. – DGKarlsson Sep 15 '15 at 12:43

score 0 · Answer 3 · edited May 23 '17 at 12:29

0

Similar question to this: Rearrange dataframe to a table, the opposite of "melt"

The reshape2::dcast function works well.

Example:

ColA <- rep(c('a', 'b', 'c'), each=3)
ColB <- rep(c('a', 'b', 'c'), times=3)
ColC <- round(runif(9)*10, 0)
df <- data.frame(ColA, ColB, ColC)
require(reshape2)
dcast(df, ColA~ColB)

Output:

  ColA a b c
1    a 8 3 4
2    b 7 8 6
3    c 9 5 8

edited May 23 '17 at 12:29

Community

1
1

answered Sep 15 '15 at 12:15

PatrickB

56
6

Use spread with three columns where first column has state and second transition state and values in third

3 Answers3