2

I have a dataset with two variables in a wide format that I want to convert to long. I've seen the examples from the cook for R but they are for just one variable.

My dataset (graph.dat) looks like this:

 PH25   PH50    PH75    PH100   P25      P50    P75    P100      r_b    c
3.96    5.94    5.94    5.94    1,694   1,736   1,329   800     0.029   7
2.86    2.86    2.86    2.86    1,665   1,792   1,639   1,335   0.027   7
3.96    5.72    5.72    5.72    1,580   1,560   1,156   663     0.023   7

I want to plot PH (x-axis) versus P(y-axis). So for a PH25 value corresponds a value of P25 (and a value of r_b and c); so the long format will be:

PH       P      r_b         c
3.96    1,694   0.029       7
2.86    1,665   0.027       7
3.96    1,580   0.023       7
5.94    1,736   0.029       7
2.86    1,792   0.027       7
5.72    1,560   0.023       7
5.94    1,329   0.029       7
2.86    1,639   0.027       7
5.72    1,156   0.023       7
5.94    800     0.029       7
2.86    1,335   0.027       7
5.72    663     0.023       7

I try to "melt" it twice but it doesn't work. Here is what I did:

graph.dat1<- melt(graph.dat, id.vars=c("PH25","PH50","PH75",
                                        "PH100","r_b", "c"),
                             variable.name="P", 
                             value.name="Pval")

and the new dataset (graph.dat1) is fine because I've only three values for each "P" (giving me a total of 12 observations). It looks like this:

    PH25    PH50    PH75    PH100   r_b     c    P     Pval
1   3.96    5.94    5.94    5.94    0.029   7   P25    1694
2   2.86    2.86    2.86    2.86    0.027   7   P25    1665
3   3.96    5.72    5.72    5.72    0.023   7   P25    1580
4   3.96    5.94    5.94    5.94    0.029   7   P50    1736
5   2.86    2.86    2.86    2.86    0.027   7   P50     1792
6   3.96    5.72    5.72    5.72    0.023   7   P50     1560
7   3.96    5.94    5.94    5.94    0.029   7   P75     1329
8   2.86    2.86    2.86    2.86    0.027   7   P75     1639
9   3.96    5.72    5.72    5.72    0.023   7   P75     1156
10  3.96    5.94    5.94    5.94    0.029   7   P100    800
11  2.86    2.86    2.86    2.86    0.027   7   P100    1335
12  3.96    5.72    5.72    5.72    0.023   7   P100    663

but when I do the second "melt" it doesn't work and I don't know how to solve it. This is the second step I did:

graph.dat2<- melt (graph.dat1,id.vars=c("r_b", "c", "P", "Pval"),
                   variable.name="PdH",
                   value.name="PH")

but then I get 4 times the number of observations (so instead of 12 observations I get 48 observations). So, my new dataset (graph.dat2) looks like:

    r_b     c   P     Pval  PdH      PH
1   0.029   7   P25   1694  PH25    3.96
2   0.027   7   P25   1665  PH25    2.86
3   0.023   7   P25   1580  PH25    3.96
4   0.029   7   P50   1736  PH25    3.96
5   0.027   7   P50   1792  PH25    2.86
6   0.023   7   P50   1560  PH25    3.96
7   0.029   7   P75   1329  PH25    3.96
8   0.027   7   P75   1639  PH25    2.86
9   0.023   7   P75   1156  PH25    3.96
10  0.029   7   P100   800  PH25    3.96
11  0.027   7   P100  1335  PH25    2.86
12  0.023   7   P100   663  PH25    3.96
13  0.029   7   P25   1694  PH50    5.94
14  0.027   7   P25   1665  PH50    2.86
15  0.023   7   P25   1580  PH50    5.72
...
Uwe
  • 41,420
  • 11
  • 90
  • 134
Emil
  • 23
  • 3

3 Answers3

1

The recent versions of data.table allow to melt multiple columns simultaneously.

Unfortunately, both sets of columns start with the letter P. Simply specifying patterns("PH", "P") won't work as P will match the PH columns as well as the P columns resulting in 24 rows instead of 12. The slightly modified regular expressions patterns("PH", "P\\d") will work:

library(data.table)   # CRAN version 1.10.4 used
graph.dat1 <- melt(setDT(graph.dat), measure.vars = patterns("PH", "P\\d"), 
     value.name = c("PH", "P"))
# rename factor levels of variable
graph.dat1[, variable := forcats::lvls_revalue(variable, c("25", "50", "75", "100"))][]
      r_b c variable   PH     P
 1: 0.029 7       25 3.96 1,694
 2: 0.027 7       25 2.86 1,665
 3: 0.023 7       25 3.96 1,580
 4: 0.029 7       50 5.94 1,736
 5: 0.027 7       50 2.86 1,792
 6: 0.023 7       50 5.72 1,560
 7: 0.029 7       75 5.94 1,329
 8: 0.027 7       75 2.86 1,639
 9: 0.023 7       75 5.72 1,156
10: 0.029 7      100 5.94   800
11: 0.027 7      100 2.86 1,335
12: 0.023 7      100 5.72   663

Note that it is required to coerce graph.dat to a data.table object by using either setDT(graph.dat) or data.table(graph.dat). Otherwise, reshape2::melt() would be dispatched on a data.frame object which doesn't recognize patterns().

Also note, there is an answer to a similar question here but the required patterns() are quite different.

Data

library(data.table)
graph.dat <- fread(
"PH25   PH50    PH75    PH100   P25      P50    P75    P100      r_b    c
3.96    5.94    5.94    5.94    1,694   1,736   1,329   800     0.029   7
2.86    2.86    2.86    2.86    1,665   1,792   1,639   1,335   0.027   7
3.96    5.72    5.72    5.72    1,580   1,560   1,156   663     0.023   7",
data.table = FALSE)
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • Thanks a lot, Uwe Block! I did see the question and answers you mention but I thought it had only one variable. Now, that I see your answer to my question I have a better understanding of it. Thanks again! – Emil Jul 04 '17 at 17:30
1

An alternative solution in base R would be:

df <- read.table(header=TRUE, text="PH25   PH50    PH75    PH100   P25          P50    P75    P100      r_b    c
3.96    5.94    5.94    5.94    1,694   1,736   1,329   800     0.029   7
2.86    2.86    2.86    2.86    1,665   1,792   1,639   1,335   0.027   7
3.96    5.72    5.72    5.72    1,580   1,560   1,156   663     0.023   7")

(want<-do.call(rbind,
               Map(
                 function(x,y) data.frame(PH=df[[x]],P=df[[y]],r_b=df$r_b,c=df$c),
                 c("PH25","PH50","PH75","PH100"),
                 c("P25","P50","P75","P100")
                 )))
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
0

not too difficult with base R reshape either:

reshape(df, direction="long", varying=list(paste0("PH", 1:4*25), paste0("P", 1:4*25)))
# note: paste0("PH", 1:4*25) is shortcut for c("PH25", "PH50", "PH75", "PH100")

.. to set the names right (PH instead of PH25 in the long format):

reshape(df, direction="long", varying=list(paste0("PH", 1:4*25), paste0("P", 1:4*25)),
    v.names=c("PH", "P"))
lebatsnok
  • 6,329
  • 2
  • 21
  • 22