0

I have an input like this .

type1   type2   2
type1   type3   4
type1   type5   3
type2   type1   6
type2   type4   2
type2   type3   3
type3   type1   2
type3   type2   2
type3   type4   4
type4   type1   7
type4   type2   1
type4   type3   4
type5   type1   2
type5   type3   3
type5   type4   1

Here column1 is suppose one state and column two is second state and the third column has values corresponding to this transition. So I want it to be spread such that I have unique states in column1 and the rest of the columns are name with all unique column names and each cell in rows should contain the counts from third column.

So the output should look like this.

types   type1   type2   type3   type4   type5
type1   0       2       4       0       3
type2   6       0       3       2       0
type3   2       2       0       2       0
type4   7       1       4       0       0
type5   2       0       3       1       0

I have tried this which gives error : Duplicate identifiers for rows (7, 8). I don't know how to use spread in this case.

seq=read.csv("test.txt",header=FALSE,sep="\t")
colnames(seq) = c("state1","state2","counts")
seqs=spread(data=seq,state1,state2,fill=0)

Any help is appreciated.

Akshay Hazari
  • 3,186
  • 4
  • 48
  • 84

3 Answers3

2

With reshape2 the solution is straightforward:

dcast(df, V1 ~ V2, fill=0)
#     V1 type1 type2 type3 type4 type5
#1 type1     0     2     4     0     3
#2 type2     6     0     3     2     0
#3 type3     2     2     0     4     0
#4 type4     7     1     4     0     0
#5 type5     2     0     3     1     0
Pierre L
  • 28,203
  • 6
  • 47
  • 69
1

I think that the code below will solve this for you:

> grid <- with(d, expand.grid(V1=levels(V1), V2=levels(V2)))
> d2 <- merge(d, grid, all.y=TRUE)
> l <- split(d2[c("V2", "V3")], d2$V1)
> t(sapply(l, function(x) { ret <- x$V3; names(ret) <- x$V2; ret}))
      type1 type2 type3 type4 type5
type1    NA     2     4    NA     3
type2     6    NA     3     2    NA
type3     2     2    NA     4    NA
type4     7     1     4    NA    NA
type5     2    NA     3     1    NA

Might not be the nicest looking code to do this, but it was what came to mind.

DGKarlsson
  • 1,091
  • 12
  • 18
0

Similar question to this: Rearrange dataframe to a table, the opposite of "melt"

The reshape2::dcast function works well.

Example:

ColA <- rep(c('a', 'b', 'c'), each=3)
ColB <- rep(c('a', 'b', 'c'), times=3)
ColC <- round(runif(9)*10, 0)
df <- data.frame(ColA, ColB, ColC)
require(reshape2)
dcast(df, ColA~ColB)

Output:

  ColA a b c
1    a 8 3 4
2    b 7 8 6
3    c 9 5 8
Community
  • 1
  • 1
PatrickB
  • 56
  • 6