0

Assume a DF of:

    pnr <- c(1, 1, 1, 2, 2, 3, 4, 5, 5)
    diag <- c("a", "a", NA, "b", "a", NA, "c", "a", "f")
    year <- rep(2007, 9)
    ht <- data.frame(pnr, diag, year)

Now I need to reshape such that:

    require(reshape2)
    md <- melt(ht, id = c("pnr", "year"))
    output <- dcast(md, pnr ~ value)

Output is now in the format I want. But when I run this on a large data frame, 13million rows, it will crash R-studio. Is there some smart way to split a dataframe, do the dcast, and tie back?

EDIT : The solutions posted below, will not work in this case, as I not able to install. Surely there is some way to work around this?

Community
  • 1
  • 1
Repmat
  • 690
  • 6
  • 19
  • 1
    You could try with `dplyr/tidyr` functions `gather/spread` or convert the data.frame to `data.table` and use `dcast.data.table`. I hope it works. Also, you don't need `as.data.frame(cbind(`, simply `data.frame(` would be enough. The former would convert all the columns to character as `cbind` gets a matrix output and matrix can have only a single class. In your data, there are `character` columns as well. – akrun Mar 11 '15 at 12:39
  • 4
    Try `library(data.table);dcast.data.table(melt(setDT(ht), id=c('pnr', 'year')), pnr~value)` – akrun Mar 11 '15 at 12:46
  • 4
    No need to `melt` here: `dcast.data.table(setDT(ht), pnr ~ diag, value.var="diag")` should be sufficient. – Arun Mar 11 '15 at 12:57
  • dcast.data.table is not found, I guess I have an old version. Running on a server, guess I wont be seeing an update any time soon – Repmat Mar 11 '15 at 13:03
  • You can install packages on a local directory. Search on StackOverflow on how to install packages locally. – Arun Mar 11 '15 at 13:26
  • Sorry, I was being unclear. I run R on an air grab solution - the server is not online, and I do not have physical acces to it. – Repmat Mar 11 '15 at 13:33

1 Answers1

0

The easy solution to this case turned out to be switching back to the old reshape package. Which means useing cast instead of dcast. Arun's comments are highly usable, providede one can actually update. Related

Repmat
  • 690
  • 6
  • 19
  • What is/was the difference? – KArrow'sBest Feb 02 '23 at 21:25
  • When I was working on this problem (ca. 8years ago), I had to work on a server where I had no access to the internet, so updating R packages was simple not a possibility. As far I remember I was working with an early release of reshape2, which had poor memory management, and it crashed the server :-( Today I would most likely solve this using data.table or something from the tidy-universe – Repmat Feb 24 '23 at 09:54