I am creating a fake dataset, and would like to essentially disaggregate a sum to create dummy rows that I can populate with random dates.
For example, my df
might look like this:
id orders skips
joe 3 0
mary 2 1
jack 5 1
I want to produce is a data.frame
or data.table
that looks like this, where a successful order
is 1
and a skip is 0
:
id order
joe 1
joe 1
joe 1
mary 1
mary 0
mary 1
jack 1
jack 1
jack 1
jack 1
jack 0
jack 1
ADDITION: Ideally, the 0
values would be randomly mixed/sandwiched between 1
values if possible. This is due to a quirk of what the dataset will be used for in a problem set.
In a perfect world, I'd then assign a random start_date
from a given range to each order within id
, such that:
id order date
joe 1 1/2/2016
joe 1 1/3/2016
joe 1 1/8/2016
mary 1 1/10/2016
mary 0 1/3/2016
mary 1 1/5/2016
jack 1 1/7/2016
jack 1 1/2/2016
jack 1 1/1/2016
jack 1 1/10/2016
jack 0 1/12/2016
jack 1 1/15/2016
I initially thought that I could use a combination of dcast
and reshape
to trick R into making the dataset, e.g.dcast(df,id~orders,fun.aggregate=length)
but this took me down the wrong path.
But, one must walk before they crawl. Anyone able to help?