2

I have a dataframe with multiple columns of months and total (13 columns in total) and two periods (half a year, labeled as 1 and 7).

I am trying to spread it wide and have it presented as January1 and January7 (for all 12 months + Total and Total1). And then I will calculate the difference between periods.

Please advise how to do it.

I tried spread(), old Hadley's function, but these multiple months complicate everything with keys and values. Also failed with pivot_wide().

Obviously tried multiple questions like this.

My data sample is here.

structure(list(Year = c(2019, 2019, 2019, 2019, 2019), Period = c(1, 
1, 7, 1, 7), KPKV = c(99999, 110000, 111000, 111010, 111010), 
KEKV = c(4, 1, 1, 2, 2), Name = c("A", "B", "B", "B", "B"
), January = c(70198346.4, 125181.4, 125181.4, 64008.4, 34374.1
), February = c(71052496.2, 127697.1, 127697.1, 66007.3, 
34719.1), March = c(96884031.5, 142375.3, 142375.3, 75510.2, 
38082.1), April = c(74389605.4, 139627.8, 139627.8, 75891.9, 
37262.5), May = c(101876908, 144649.4, 144649.4, 79889.6, 
41150), June = c(86362730.8, 178706.8, 178706.8, 96616, 49727.9
), July = c(74326532.8, 178708.4, 178708.4, 96616, 55955.7
), August = c(80052666.3, 186225.8, 186225.8, 102606.5, 30816.8
), September = c(90236044.8, 182131, 182131, 102885.7, 49123.1
), October = c(79077964, 175287.8, 175287.8, 101166.1, 49942.8
), November = c(92509081.2, 185182.1, 185182.1, 109051.8, 
37609.2), December = c(88801141.2, 198270.2, 198270.2, 119648, 
37609.2), Total = c(1005767549, 1964043.1, 1964043.1, 1089897.5, 
496372.5)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), spec = structure(list(cols = list(
Year = structure(list(), class = c("collector_double", "collector"
)), Period = structure(list(), class = c("collector_double", 
"collector")), KPKV = structure(list(), class = c("collector_double", 
"collector")), KEKV = structure(list(), class = c("collector_double", 
"collector")), Name = structure(list(), class = c("collector_character", 
"collector")), January = structure(list(), class = c("collector_double", 
"collector")), February = structure(list(), class = c("collector_double", 
"collector")), March = structure(list(), class = c("collector_double", 
"collector")), April = structure(list(), class = c("collector_double", 
"collector")), May = structure(list(), class = c("collector_double", 
"collector")), June = structure(list(), class = c("collector_double", 
"collector")), July = structure(list(), class = c("collector_double", 
"collector")), August = structure(list(), class = c("collector_double", 
"collector")), September = structure(list(), class = c("collector_double", 
"collector")), October = structure(list(), class = c("collector_double", 
"collector")), November = structure(list(), class = c("collector_double", 
"collector")), December = structure(list(), class = c("collector_double", 
"collector")), Total = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

UPDATED:

After using the first solution, the data was transformed, but not everything was done properly. Some columns were missing. I believe it happened because KPKV column is unique, but KEKV column can have multiple values under the same KPKV.

MY EXPECTED OUTPUT

structure(list(Year = 2019, KPKV = 99999, KEKV = 4, Name = "Random name", 
April1 = 74389605.4, April7 = NA_real_, August1 = 80052666.3, 
August7 = NA_real_, December1 = 88801141.2, December7 = NA_real_, 
February1 = 71052496.2, February7 = NA_real_, January1 = 70198346.4, 
January7 = NA_real_, July1 = 74326532.8, July7 = NA_real_, 
June1 = 86362730.8, June7 = NA_real_, March1 = 96884031.5, 
March7 = NA_real_, May1 = 101876908, May7 = NA_real_, November1 = 
92509081.2, 
November7 = NA_real_, October1 = 79077964, October7 = NA_real_, 
September1 = 90236044.8, September7 = NA_real_, Total1 = 1005767548.6, 
Total7 = NA_real_), row.names = 1L, class = "data.frame")
Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
  • Do you need `library(data.table);dcast(setDT(df1), Name + KPKV + Year ~ Period, value.var = month.name, sep="")` – akrun Aug 26 '19 at 17:48

2 Answers2

3

use the base R reshape function:

 reshape(data.frame(df),idvar = "Name",timevar = "Period",dir="wide",sep="")

  Name Year1  KPKV1 KEKV1   January1  February1     March1     April1        May1      June1      July1    August1 September1
1    A  2019 110000     1 70198346.4 71052496.2 96884031.5 74389605.4 101876908.0 86362730.8 74326532.8 80052666.3   90236045
2    B  2019 110000     1   125181.4   127697.1   142375.3   139627.8    144649.4   178706.8   178708.4   186225.8     182131
    October1  November1  December1     Total1 Year7  KPKV7 KEKV7 January7 February7   March7   April7     May7    June7
1 79077964.0 92509081.2 88801141.2 1005767549  2019 111000     1 125181.4  127697.1 142375.3 139627.8 144649.4 178706.8
2   175287.8   185182.1   198270.2    1964043  2019 111010     1  64008.4   66007.3  75510.2  75891.9  79889.6  96616.0
     July7  August7 September7 October7 November7 December7  Total7
1 178708.4 186225.8   182131.0 175287.8  185182.1  198270.2 1964043
2  96616.0 102606.5   102885.7 101166.1  109051.8  119648.0 1089898
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • The code works perfectly, very grateful. But my input was wrong! I updated the input. I am sample input I thought that KPKV and KEKV columns have unique values, but it was wrong. KEPV is a code for Name, but can have multiple KEKV under it. I changed the input. – Anakin Skywalker Aug 26 '19 at 17:40
  • 1
    @Oleksiy the code still works with the updated dataframe. Thats what I mean – Onyambu Aug 26 '19 at 17:42
  • Warning messages: 1: In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, : multiple rows match for Month=1: first taken 2: In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, : multiple rows match for Month=7: first taken – Anakin Skywalker Aug 26 '19 at 17:45
  • 1
    @Oleksiy Where is the warning coming from? Did you change the code I gave? – Onyambu Aug 26 '19 at 17:45
  • Nop, only variable's name, which should not affect it. `budget_wide <- reshape(data.frame(budget_renamed), idvar = "Name", timevar = "Month", dir = "wide", sep = "")` It omits rows `[ reached 'max' / getOption("max.print") -- omitted 710 rows ]` – Anakin Skywalker Aug 26 '19 at 17:47
  • 1
    @Oleksiy `timevar="Period"` not `month` you do not have a variable `month` in the dataframe you provided – Onyambu Aug 26 '19 at 17:49
  • Yes, I renamed `Period` to `Month`, it is the same for me in both cases (the warning). – Anakin Skywalker Aug 26 '19 at 17:50
  • 1
    @Oleksiy have just run the code and it works. I am not quite sure where the problem is. You will have to ask this question again and give abit bigger dataframe. Questions asked beyond 5hours are rarely visited by new people to answer them – Onyambu Aug 26 '19 at 17:53
  • Might do it again, thanks for your time and patience! – Anakin Skywalker Aug 26 '19 at 17:55
  • 1
    Looks like OP is correct regarding warnings per this [rextester demo](https://rextester.com/BKGIO58035). There are multiple 1's and 7's for each year and month. OP needs to advise how to handle: *April1_1, April1_2*)? – Parfait Aug 26 '19 at 18:24
  • 2
    @Oleksiy with the updated results, it seems your `idvar=KEKV` and `name` and .... Thus you should do: `reshape(data.frame(df),idvar = c("KEKV","Name","Year"),timevar = "Period",dir="wide",sep="")`, you can add any other id variable you think of and see whether you get what you want – Onyambu Aug 26 '19 at 18:42
  • 1
    @Parfait with the updated data and expected results, the `id` seems to be the `KEKV` – Onyambu Aug 26 '19 at 18:43
2

An option is pivot_wider from the devel version of tidyr

library(tidyr) #‘0.8.3.9000’
library(dplyr)
df1 %>% 
    pivot_wider(id_cols =  Name, names_from = Period, 
       values_from = c(January:December), names_sep = "")
# A tibble: 2 x 25
#  Name  January1 January7 February1 February7 March1 March7 April1 April7   May1   May7  June1  June7  July1  July7 August1 August7 September1 September7
#  <chr>    <dbl>    <dbl>     <dbl>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>      <dbl>      <dbl>
#1 A       7.02e7  125181. 71052496.   127697. 9.69e7 1.42e5 7.44e7 1.40e5 1.02e8 #1.45e5 8.64e7 1.79e5 7.43e7 1.79e5  8.01e7 186226.  90236045.    182131 
#2 B       1.25e5   64008.   127697.    66007. 1.42e5 7.55e4 1.40e5 7.59e4 1.45e5 7.99e4 1.79e5 9.66e4 1.79e5 9.66e4  1.86e5 102606.    182131     102886.
# … with 6 more variables: October1 <dbl>, October7 <dbl>, November1 <dbl>, November7 <dbl>, December1 <dbl>, December7 <dbl>

Or with previous tidyr version

library(dplyr)
library(tidyr)
df1 %>% 
    gather(key, val, January:December) %>%
    unite(key, key, Period, sep="") %>%
    spread(key, val)

Or data.table option

library(data.table)
dcast(setDT(df1), Name + KPKV + Year ~ Period, value.var = month.name, sep="") 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks, having some issues with dev version of tidyr, but will fix it! – Anakin Skywalker Aug 25 '19 at 00:09
  • 1
    @Oleksiy Could you please updatee with the expected output in your posst – akrun Aug 26 '19 at 17:10
  • 1
    @Oleksiy with the previous version, do you need `df1 %>% gather(key, val, January:December) %>% unite(key, key, Period, sep="") %>% spread(key, val)` – akrun Aug 26 '19 at 17:13
  • 1
    @Oleksiy I saw your input data. Can you update with expected – akrun Aug 26 '19 at 17:43
  • your solution `df1 %>% gather(key, val, January:December) %>% unite(key, key, Period, sep="") %>% spread(key, val)` looks better, now I have 2000+ rows instead of 700+ before. Your dt soltion is also the same in number of rows `budget_dt <- dcast(setDT(budget_renamed), Name + KPKV + KEKV + Year ~ Month, value.var = month.name, sep = "")` but the column `Total missing`. I am double checking the exported reshaped data, trying to understand it. But the number of rows looks much better. Thanks! – Anakin Skywalker Aug 26 '19 at 18:10
  • @Oleksiy Not sure what is your expected – akrun Aug 26 '19 at 18:15
  • 1
    Updated expected, looks like all works! Can you please update your answer with `df1 %>% gather(key, val, January:December) %>% unite(key, key, Period, sep="") %>% spread(key, val)`, I will accept it – Anakin Skywalker Aug 26 '19 at 18:22