0

I am trying to unmelt a big table with variables. this thread did only give me hints but did not solve my problem (How to "unmelt" data with reshape r).

I am trying to get from here:

name1   name2   values  group
xyz1    abc1    1   group1
xyz2    ghf 2   group2
xyz3    lmn 3   group1
xyz4    opq 4   group2
xyz1    abc1    5   group2
xyz2    ghf 6   group1
xyz3    lmn 7   group2
xyz4    opq 8   group1
xyz5    stu 2   group3

to

name1   name2   group_1 gruoup_2    group_3
xyz1    abc1    1   6   NA
xyz2    ghf33   6   2   NA
xyz3    lmn93   3   5   NA
xyz4    opq42   8   4   NA
xyz5    stu21   NA  NA  2

my current commands to "unmelt" are

d <- read.table("test.txt", header=T, sep="\t")
e <- dcast(d, name1 + name2 ~ group ~ values)

but this does not assign the values to the groups and name1 and name2 Would be great if you could help me out. Thank you so much! D

Dennis
  • 45
  • 8
  • 3
    Using `data.table`: `dcast(dat, name1 + name2 ~ group, value.var="values")`. – lmo Mar 23 '18 at 23:32
  • dcast(dat, name1+name2~group, value.var="values") works for me (using data.table's dcast) – caw5cv Mar 23 '18 at 23:32
  • To do this in base R, you can use `reshape` like this `reshape(dat, direction="wide", idvar=c("name1", "name2"), timevar="group")`. – lmo Mar 23 '18 at 23:46
  • @lmo Just beat you to it;-) – Maurits Evers Mar 23 '18 at 23:48
  • 1
    It's a close call, but you have an up-vote from me for reminding users of this versatile, but tricky (read maddening) function. – lmo Mar 23 '18 at 23:50
  • @lmo Thanks:-) and yes, definitely not the most intuitive. Personally, I'd use `dplyr::spread` or `reshape2::dcast`. PS. I stand corrected. You were first by a ~10 seconds! – Maurits Evers Mar 23 '18 at 23:53
  • 1
    @MauritsEvers (soapbox) As personal preference, I try to avoid the 'verse. Note that you mention `dplyr::spread`, while an answer mentions `tydyr::spread`. I typically work in a package constrained environment and sometimes have difficulties adding packages. In addition, I still remember a changing interface of `gglot2`, while some poor programming choices on my part, that involved 5+ hours of adjusting a script to fit with the new version. Another ex see discussions in [data.frame to list](https://stackoverflow.com/questions/3492379/data-frame-rows-to-a-list/49290152#49290152). – lmo Mar 24 '18 at 00:05
  • @lmo I see your point and I tend to agree. I'm always getting confused with `tidyverse` routines, and which packages they originate from. I used to do most things in base R but over the last few years I think `tidyverse` has reached a level of maturity that makes it difficult to justify to myself not using those libraries, on account of `tidyverse` routines often being more intuitive and faster. – Maurits Evers Mar 24 '18 at 00:27
  • Guys thank you! you saved me my evening with my wife. so I don't have to do try and error... the dcast command leads to this error: #Aggregation function missing: defaulting to length – Dennis Mar 24 '18 at 00:28
  • @Dennis This means that name1, name2, and group do not uniquely identify your initial dataset. Take a closer look tomorrow. – lmo Mar 24 '18 at 00:44
  • @Dennis The `reshape2::dcast` command would be `reshape2::dcast(df, name1 + name2 ~ group, value.var = "values")`. – Maurits Evers Mar 24 '18 at 00:57

2 Answers2

3

You can use base R's reshape:

reshape(df, idvar = c("name2", "name1"), timevar = "group", direction = "wide")
#  name1 name2 values.group1 values.group2 values.group3
#1  xyz1  abc1             1             5            NA
#2  xyz2   ghf             6             2            NA
#3  xyz3   lmn             3             7            NA
#4  xyz4   opq             8             4            NA
#9  xyz5   stu            NA            NA             2

Or using reshape2::dcast:

reshape2::dcast(df, name1 + name2 ~ group, value.var = "values");
#  name1 name2 group1 group2 group3
#1  xyz1  abc1      1      5     NA
#2  xyz2   ghf      6      2     NA
#3  xyz3   lmn      3      7     NA
#4  xyz4   opq      8      4     NA
#5  xyz5   stu     NA     NA      2

Sample data

df <- read.table(text =
    "name1   name2   values  group
xyz1    abc1    1   group1
xyz2    ghf 2   group2
xyz3    lmn 3   group1
xyz4    opq 4   group2
xyz1    abc1    5   group2
xyz2    ghf 6   group1
xyz3    lmn 7   group2
xyz4    opq 8   group1
xyz5    stu 2   group3", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thank you so much. It seems to work. but takes already 10min to calculate. do duplicates( see my comment below )matter? – Dennis Mar 24 '18 at 00:27
  • @Dennis Try the `reshape2::dcast` option. Not sure about dupes. Perhaps update your sample data to include some of those duplicate entries? – Maurits Evers Mar 24 '18 at 01:00
  • @Dennis Also see [this post](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) for more options on how to reshape from long to wide. – Maurits Evers Mar 24 '18 at 01:04
  • will have a look tomorrow. dcast gives me an error though: Aggregation function missing: defaulting to length and value is always only "1" – Dennis Mar 24 '18 at 01:40
  • @Dennis Are you sure you're using `reshape2::dcast`? The example I give definitely works with the sample data. – Maurits Evers Mar 24 '18 at 01:42
  • Ihr have a Tablet with >100k rows. There Seen to bei odenrical named 1 and 2 within one group. What would be the easiest to get around this? Works on my smaller test data set....see as well below error from tidyr::spread – Dennis Mar 24 '18 at 01:47
  • @Dennis Not sure what you mean by "Tablet"; you need to come up with a strategy on how to deal with these dupes first before reshaping the data. That's really dependent on your data/problem, and has nothing to do with reshaping data. Perhaps `aggregate` rows with identical `name1`, `name2`, `group` by summing the `value`, then reshape? – Maurits Evers Mar 24 '18 at 01:50
  • Meant table. Autocorrections... The best way would be to get only a long table with the dups and to remove them first for the wide table. So I can get back to the data. Would be great if there would be an easy way doing this – Dennis Mar 24 '18 at 01:57
  • Removing dupes is easy with `duplicated`. It might just be a matter of doing `df[!duplicated(df[, -3]), ]` and then using `reshape`/`reshape2::dcast`. If you provide some sample data containing duplicated entries it's easier to give a concrete example. – Maurits Evers Mar 24 '18 at 02:01
  • this is taken from one of the samples same col names...not looking nice w/o code function `ENSG000001853630 PBX12 11.316 group1 ENSG00000274155 AL3910012.1 0 group1 ENSG00000207082 RNU6-1731P 3 group1 ENSG000001853630 PBX12 0 group1` – Dennis Mar 24 '18 at 02:04
  • it seems to be caused by duplicateds. but if I use: `b=b[!duplicated(b$name1, b$name2, b$group),]` it drops about 2000rows and still the same error – Dennis Mar 24 '18 at 02:22
  • @Dennis That's not how `duplicated` works. Can you please edit your question to include a minimal representative sample; this question has moved away from a long to wide reshape to how to deal with/aggregate duplicate rows (and then reshape). – Maurits Evers Mar 24 '18 at 04:41
1

You can use spread in tidyr. That is

spread(my_data, group, values)

For other examples of spread, see here

smanski
  • 541
  • 2
  • 7
  • Thank you so much for your help. the spread function lead to this error: #Error: Duplicate identifiers for rows (3046, 3051), (3715, 3718), (3737, 3738), (5379, 5380), (5983, 5984), (7919, 7932), (9610, 9611, 9612), (10177, 10192), (10595, 10597), (11082, 11083), (11568, 11569), (12209, 12210), (12530, 12532), (12815, 12819), (12834, 12835), (14172, 14173), (14267, 14272), (15134, 15135), (16643, 16644), ..... any way I can sort this out? thanks though. D – Dennis Mar 24 '18 at 00:25