R spread data frame

Question

I have this data set dput:

structure(list(Account.Name = c("CMD", "CMD", "CMD", "CMD", "CMD", 
"CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", 
"CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "CMD", "Colimbra", 
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", 
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", 
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", 
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", 
"Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", "Colimbra", 
"Colimbra"), Date.y = structure(c(47L, 38L, 39L, 46L, 29L, 30L, 
31L, 37L, 36L, 34L, 43L, 45L, 41L, 42L, 33L, 40L, 27L, 28L, 32L, 
35L, 44L, 26L, 9L, 24L, 17L, 23L, 18L, 6L, 8L, 5L, 12L, 10L, 
7L, 11L, 35L, 25L, 19L, 16L, 34L, 27L, 4L, 26L, 20L, 29L, 15L, 
33L, 32L, 30L, 14L, 22L, 31L, 13L, 21L, 28L), .Label = c("", 
"2012-12-01", "2013-01-01", "2013-02-01", "2013-03-01", "2013-04-01", 
"2013-05-01", "2013-06-01", "2013-07-01", "2013-08-01", "2013-09-01", 
"2013-10-01", "2013-11-01", "2013-12-01", "2014-01-01", "2014-02-01", 
"2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01", "2014-07-01", 
"2014-08-01", "2014-09-01", "2014-10-01", "2014-11-01", "2014-12-01", 
"2015-01-01", "2015-02-01", "2015-03-01", "2015-04-01", "2015-05-01", 
"2015-06-01", "2015-07-01", "2015-08-01", "2015-09-01", "2015-10-01", 
"2015-11-01", "2015-12-01", "2016-01-01", "2016-02-01", "2016-03-01", 
"2016-04-01", "2016-05-01", "2016-06-01", "2016-07-01", "2016-08-01", 
"2016-09-01"), class = "factor"), EI = c(0.172413778757433, 0.283582069077747, 
0.304347804744803, 0.278195468486632, 0.675675653544559, 0.965738751378275, 
0.79789472055251, 0.571428546702807, 0.364238387240035, 0.333333310925928, 
0.333333310925928, 0.267175552791797, 0.30935249644739, 0.30935249644739, 
0.547169786306516, 0.342465730716834, 0.25581393431044, 0.593220290504169, 
0.529411739555941, 0.538461513372782, 0.333333310925928, 0.119266044513089, 
0.00689655157368212, 0.0932835783248028, 0.117967327490881, 0.111415832683409, 
0.0864661618980282, 0.0170648454887846, 0.0380999488912474, 0.00803673911715819, 
0.0500855092066307, 0.00942675138629104, 0.0201612894472413, 
0.0046082948309584, 0.00435339299151454, 0.144554447192982, 0.0830188645366324, 
0.0825861213183505, 0.0129474483080438, 0.0240963850193243, 0.00917431152659711, 
0.0215175530933231, 0.0953932023013541, 0.00917431172607524, 
0.0873239401148782, 0.00892174336861008, 0.018429689070739, 0.0352357312529589, 
0.0470588220329153, 0.059847657373831, 0.00588084875970071, 0.0479921625133198, 
0.229030327296333, 0.00613496919149197)), .Names = c("Account.Name", 
"Date.y", "EI"), row.names = c(69L, 70L, 71L, 72L, 73L, 74L, 
75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 
88L, 89L, 90L, 91L, 95L, 96L, 99L, 101L, 104L, 105L, 107L, 108L, 
109L, 110L, 111L, 113L, 114L, 116L, 117L, 118L, 119L, 120L, 121L, 
122L, 123L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L, 
134L), class = "data.frame")

and I need to spread it (pivot it) in such way that each row will be one Account.Name and each column will be the related Date.y column with a column name starting from 0_date (if it's the last Date.y date value for that account) and ending with i_date (i is the index of the first date record for the account counting from the end to the beginning). for instance:

Account.Name date_0, date_1, date_2... CMD 0.333333311 0.333333311 0.309352496

where date_0 corresponds to 2016-06-01 date_1 corresponds to 2016-05-01 date_2 corresponds to 2016-04-01 and so on ... I tried to use tidyr::spread however, the column name are assigned to the original date values, and I want to make a relative date columns names (counting from 0_date, 1_date until last date for each account) Any idea appreciated

Can you show the full expected output? Have you tried `library(reshape2);dcast(df1, Account.Name~Date.y, value.var="EI")` — akrun, Sep 18 '16 at 12:25
if I had the full expected output, I wouldn't be here. Is there something specific that's you don't understand in the expected output ? — RNN, Sep 18 '16 at 12:43
Sorry, I didn't understand what you wanted because the minimum and max date in the dput output is `"2013-02-01"` and `"2016-09-01"`. So how is `date_0` corresponds to "2016-06-01"`. Also, if you have looked at other posts with reproducible examples, most of them provided the expected output (perhaps by manually creating it) so that others understand it clearly. — akrun, Sep 18 '16 at 12:46

score 0 · Accepted Answer · answered Sep 18 '16 at 14:07

let x be your data frame

library(data.table)
library(lubridate)
dt <- data.table(x)
# date should not be factors
dt[, Date.y := ymd(Date.y)]  
setorder(dt, Account.Name, -Date.y)
dt[, col_index := 0:(.N-1L), by = Account.Name]
dt_casted <- dcast(dt, Account.Name ~ col_index, value.var = "EI")

Note I didn't use "date_0" format because I believe you will want them sorted, while "date_10" will have wrong order compare to "date_2". Better keep the index as numeric, or pad with leading 0.

R spread data frame

1 Answers1