1

I have a dataset that looks like this:

CATA 1 10101
CATA 2 11101
CATA 3 10011
CATB 1 10100
CATB 2 11100
CATB 3 10011

etc.

and I want to combine these different rows into a single, long row like this:

CATA 101011110110011
CATB 101001110010011

I've tried doing this with melt() and then dcast(), but it doesn't seem to work. Does anyone have some simple pieces of code to do this?

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
Annemarie
  • 689
  • 6
  • 14
  • 28
  • Yes, I'm sorry for not explaining that properly: There is CATB, CATC, etc. in the data.frame, and I need a single line for each. So I need to be able to distinguish different values in V1... – Annemarie Dec 13 '11 at 15:55
  • Look at my updated answer, it addresses your more up to date question – Chase Dec 13 '11 at 16:31

2 Answers2

7

Look at the paste command and specifically the collapse argument. It's not clear what should happen if/when you have different values for the first column, so I won't venture to guess. Update your question if you get stuck.

dat <- data.frame(V1 = "CATA", V2 = 1:3, V3 = c(10101, 11101, 10011))
paste(dat$V3, collapse= "")
[1] "101011110110011"

Note that you may want to convert the data to character first to prevent leading zeros from being trimmed.

EDIT: to address multiple values for the first column

Use plyr's ddply function which expects a data.frame as an input and a grouping variable(s). We then use the same paste() trick as before along with summarize().

    library(plyr)
    dat <- data.frame(V1 = sample(c("CATA", "CATB"), 10, TRUE)
                    , V2 = 1:10
                    , V3 = sample(0:100, 10, TRUE)
                    )

    ddply(dat, "V1", summarize, newCol = paste(V3, collapse = ""))

    V1         newCol
1 CATA          16110
2 CATB 19308974715042
Chase
  • 67,710
  • 18
  • 144
  • 161
  • As with all these split-apply-combine problems, `tapply` is a base R alternative to `ddply` (with output in a slightly different format). `with(dat, tapply(V3, V1, paste, collapse = ""))`. – Richie Cotton Dec 13 '11 at 17:43
0

Assuming all possible elements in V1 of dat are known,

elements <- c("CATA","CATB","CATC")
i <- 1
final_list <- c()
while (i <= length(elements)){
k <- grep(elements[i], dat$V1, ignore.case = FALSE, fixed = TRUE, value = FALSE)
m <- paste(dat$V1[k[1]], " ", paste(dat[k,3], collapse=""), sep="")
final_list <- c(final_list,m)
i=i+1
}

@Chase answer is much better !

384X21
  • 6,553
  • 3
  • 17
  • 17