1

I have a large csv data file, sample of the data as below.

name                  year            value
China                 1997            481970
Japan                 1997            8491480
Germany               1997            4678022
China                 1998            589759
Japan                 1998            7912546
Germany               1998            5426582

After several attempts with no success, I would like to interpolate my data to monthly, and then change the format of data to be as in the example below,

date             China       Japan        Germany       
1997-01-31       40164.17    707623.33    389835.17
1997-02-28       80328.33    1415246.67   779670.33
1997-03-30       1204925     2122870      1169505.50
1997-04-30       160656.67   2830493.33   1559340.67  
1997-05-31       200820.83   3538116.67   1949175.83
    .               .           .              .
    .               .           .              .
    .               .           .              .
1997-12-31       481970       8491480      4678022

1998-01-31       49146.58     659378.83    452215.17
1998-02-28       98293.17     1318757.67   904430.33
1998-03-30       147439.75    1978136.5    1356645.5 
1998-04-30       196586.33    2637515.33   1808860.67
1998-05-31       245732.97    3296894.17   2261075.83
    .               .           .              .
    .               .           .              .
    .               .           .              .
1998-12-31        589759      7912546      5426582

someone suggested this How to pivot a dataframe though it proved hard for me to reach the desired results. Maybe I'm not that good in python.

I would like to do it in R. Thoughts?

2 Answers2

2

Assuming the input shown reproducibly in the Note at the end convert it to a zoo object z which, by specifying split=, will also convert it to wide form at the same time. Then expand it using merge and use linear interpolation with na.approx. Alternately replace na.approx with na.spline. Finally convert the time index to Date class. The result is a zoo object m. If you need a data frame use fortify.zoo(m).

library(zoo)

z <- read.zoo(DF, split = 1, index = 2, FUN = as.numeric)

m <- na.approx(merge(z, zoo(, c(kronecker(time(z), 0:11/12, "+")))))
time(m) <- as.Date(as.yearmon(time(m)), frac = 1)
m

giving:

              China Germany   Japan
1997-01-31 481970.0 4678022 8491480
1997-02-28 490952.4 4740402 8443236
1997-03-31 499934.8 4802782 8394991
1997-04-30 508917.2 4865162 8346747
1997-05-31 517899.7 4927542 8298502
1997-06-30 526882.1 4989922 8250257
1997-07-31 535864.5 5052302 8202013
1997-08-31 544846.9 5114682 8153769
1997-09-30 553829.3 5177062 8105524
1997-10-31 562811.8 5239442 8057280
1997-11-30 571794.2 5301822 8009035
1997-12-31 580776.6 5364202 7960790
1998-01-31 589759.0 5426582 7912546

Note

Lines <- "name                  year            value
China                 1997            481970
Japan                 1997            8491480
Germany               1997            4678022
China                 1998            589759
Japan                 1998            7912546
Germany               1998            5426582"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

An option using data.table:

DT[, date := as.IDate(paste0(year, "-12-31"))][,
    c("y0", "y1") := .(value, shift(value, -1L, fill=value[.N])), name]

longDT <- DT[, {
    eom <- seq(min(date)+1L, max(date)+1L, by="1 month") - 1L
    v <- unlist(mapply(function(a, d) a + (0:11) * d, y0, (y1 - y0)/12, SIMPLIFY=FALSE))
    .(eom, v=v[seq_along(eom)])
}, name]

dcast(longDT, eom ~ name, sum, value.var="v")

output:

           eom     China   Germany     Japan
 1: 1996-12-31  40164.17  389835.2  707623.3
 2: 1997-01-31  76981.32  747184.1 1356278.1
 3: 1997-02-28 113798.48 1104533.0 2004932.8
 4: 1997-03-31 150615.63 1461881.9 2653587.5
 5: 1997-04-30 187432.78 1819230.8 3302242.2
 6: 1997-05-31 224249.93 2176579.7 3950896.9
 7: 1997-06-30 261067.09 2533928.6 4599551.7
 8: 1997-07-31 297884.24 2891277.5 5248206.4
 9: 1997-08-31 334701.39 3248626.4 5896861.1
10: 1997-09-30 371518.54 3605975.3 6545515.8
11: 1997-10-31 408335.70 3963324.2 7194170.6
12: 1997-11-30 445152.85 4320673.1 7842825.3
13: 1997-12-31 481970.00 4678022.0 8491480.0
14: 1998-01-31 490952.42 4740402.0 8443235.5
15: 1998-02-28 499934.83 4802782.0 8394991.0
16: 1998-03-31 508917.25 4865162.0 8346746.5
17: 1998-04-30 517899.67 4927542.0 8298502.0
18: 1998-05-31 526882.08 4989922.0 8250257.5
19: 1998-06-30 535864.50 5052302.0 8202013.0
20: 1998-07-31 544846.92 5114682.0 8153768.5
21: 1998-08-31 553829.33 5177062.0 8105524.0
22: 1998-09-30 562811.75 5239442.0 8057279.5
23: 1998-10-31 571794.17 5301822.0 8009035.0
24: 1998-11-30 580776.58 5364202.0 7960790.5
25: 1998-12-31 589759.00 5426582.0 7912546.0
           eom     China   Germany     Japan

data:

library(data.table)
DT <- fread("name     year            value
China                 1996            40164.17
Japan                 1996            707623.33
Germany               1996            389835.17
China                 1997            481970
Japan                 1997            8491480
Germany               1997            4678022
China                 1998            589759
Japan                 1998            7912546
Germany               1998            5426582")

I have taken the liberty to add in the data for 1996.

chinsoon12
  • 25,005
  • 4
  • 25
  • 35