Wide to Long data with duplication

Question

I am trying to convert a data set that contains a combination of both wide and long data. Currently it looks something like this:

picture of data

(note: there are many other variables either side, this is just the segment I need to change)

Currently, WTP1 and 2 are in the same row as they are associated with the same participant (buyers). What I need is for there to be one column for WTP so each buyer has two rows which are identical other than the WTP value. Something like this:

WTP

15

5

I have nearly reached a solution using the unite function but the problem is here that the two values are in the same cell rather than in their own rows:

library(dplyr) 
long_Data <- unite(mydata.sub1,WTP,player.WTP1:player.WTP2, sep = "_", remove= TRUE)

unite function output

I'm sure there is an easy solution to this but I'm a beginner! any suggestions welcome. TIA

Please reformat your question (data) and include the code you're working on — Tung, Jul 02 '18 at 17:05
Please share sample of your data using `dput()` (not `str` or `head` or picture/screenshot) so others can help. See more here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 — Tung, Jul 02 '18 at 17:16
If wtp is willingness to pay per unit ($15 is valuation for the first unit, $5 for the second?), I guess you'll want those rows differentiated, not "identical". Besides first/second unit, you'll also want buyer/seller ids as a column, I guess. — Frank, Jul 02 '18 at 17:17
@Frank yes so I have buyer/seller ID column (I just haven't included it here). I'm happy for them to be undifferentiated as long as R consistently lists them the value for what was WTP1 first and then WTP2 in the following row. Then they won't be explicitly labelled but I will still know which one came first — A.J, Jul 02 '18 at 17:21
Possible duplicate: [*Reshaping data.frame from wide to long format*](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) — Jaap, Jul 02 '18 at 17:22

score 1 · Accepted Answer · answered Jul 02 '18 at 17:54

1

Is this what you're looking for?

df1 <- data.frame(
    player.WTA = c(NA,20,10,NA,10,5,NA),
    player.WTP1 = c(15,NA,NA,15,NA,NA,15),
    player.WTP2 = c(5,NA,NA,5,NA,NA,5)
)

require(reshape2)

melt(df1, id.var="player.WTA", value.name="WTP")

   player.WTA    variable WTP
1          NA player.WTP1  15
2          20 player.WTP1  NA
3          10 player.WTP1  NA
4          NA player.WTP1  15
5          10 player.WTP1  NA
6           5 player.WTP1  NA
7          NA player.WTP1  15
8          NA player.WTP2   5
9          20 player.WTP2  NA
10         10 player.WTP2  NA
11         NA player.WTP2   5
12         10 player.WTP2  NA
13          5 player.WTP2  NA
14         NA player.WTP2   5

And if you just want to see the non-NA values:

require(dplyr)

melt(df1, id.var="player.WTA", value.name="WTP") %>% filter(WTP != 'NA')

  player.WTA    variable WTP
1         NA player.WTP1  15
2         NA player.WTP1  15
3         NA player.WTP1  15
4         NA player.WTP2   5
5         NA player.WTP2   5
6         NA player.WTP2   5

answered Jul 02 '18 at 17:54

Mako212

6,787
1
18
37

this looks very close! my only question is how you do this but without having to enter the exact values at the beginning and the data set I have is over 2000 rows long! – A.J Jul 02 '18 at 18:22
@A.J I'm assuming you mean you have WTP1 - WTP1000 or something like that? This code will still run fine. `id.vars` specifies the rows you don't want to move, so if you have other values to the left that need to be associated with each WTP value, you'd add those to the `id.vars` vector. `melt` assumes that anything that isn't an `id.var` is a `value.var`, meaning that as long as WTP are the only columns not in the `id.var` vector, that those will be the only values in the WTP column. Note that WTP = `value`, I just renamed it to match your example. – Mako212 Jul 02 '18 at 18:37
@A.J you might also find the `data.table` version of `melt` to be useful, like Frank uses in his example, because you can specify the measure variables with a Regex pattern, i.e. `patterns ("^WT")` which can make it easier to separate out your value/measure variables. – Mako212 Jul 02 '18 at 20:06

Frank · Answer 2 · 2018-07-02T18:10:25.390

Just looking at your first three rows and making some assumptions (adding id and endowment columns)...

DF = data.frame(
  id = 1:3, 
  endowment = c(0L, 1L, 1L), 
  WTA = c(NA, 20, 10), 
  WTP.1 = c(15, NA, NA), 
  WTP.2 = c(5, NA, NA)
)

  id endowment WTA WTP.1 WTP.2
1  1         0  NA    15     5
2  2         1  20    NA    NA
3  3         1  10    NA    NA

I would recommend formatting this in a consistent way for buyers and sellers:

prefs = data.frame(id = rep(1:3, c(2,1,1)), quantity = c(1L,2L,-1L,-1L), valuation = c(15,20,-20,-10))

  id quantity valuation
1  1        1        15
2  1        2        20
3  2       -1       -20
4  3       -1       -10

players = data.frame(id = 1:3, endowment = c(0L, 1L, 1L))

  id endowment
1  1         0
2  2         1
3  3         1

The R code I'd use to get there from DF...

library(data.table)
players = DF[, c("id", "endowment")]

prefs = melt(data.table(DF), id = "id", measure.vars = patterns("^WT"), na.rm = TRUE) 

# translate
prefs[variable == "WTA", `:=`(quantity = -1L, valuation = -value)]
prefs[variable %like% "WTP", `:=`(quantity = 1:.N, valuation = cumsum(value)), by=id]

# clean up
prefs[, c("variable", "value") := NULL]

this looks promising! as I asked Mako, is there any way around entering the exact values for WTA, WTP1 and 2 as this is just a screenshot of a file containing over 2000 rows so manually entry seems quite inefficient — A.J, Jul 02 '18 at 18:24
@A.J You don't need to enter exact values like Mako and I did with `DF` and `df1` -- that code is just there to make a concrete reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 (Ideally, you'd have code like that in the original question, so we don't have to create it.) I'm not sure if that answers your question. I assume you have the dataset `mydata.sub1` already..? — Frank, Jul 02 '18 at 18:32

Wide to Long data with duplication

2 Answers2