0

I am trying to identify the appropriate format used in my obtained dataframe, but I am having trouble finding anything that works.

The issue is that the format contains yearly figures for which the month is assumed to be some sort of zero-padded zeroth month. For example, the yearly nominal GDP is reported as 2014-00 instead of the usual 2014-01.

Thus, when I use,

df['end_of_month'] =pandas.to_datetime(df['end_of_month'], format="%Y-%m")

I get that:

ValueError: time data 2014-00 doesn't match format specified

For your consideration, here is the dataframe:

end_of_month  nominal_gdp
0       2014-00    2260005.0
1       2015-00    2398280.0
2       2016-00    2490617.0
3       2017-00    2662836.0
4       2018-00    2842883.0
5       2018-09     726352.0
6       2018-10          NaN
7       2018-11          NaN
8       2018-12     754904.0
9       2019-01          NaN
10      2019-02          NaN
11      2019-03     712514.0
12      2019-04          NaN
13      2019-05          NaN
14      2019-06     698044.0
15      2019-07          NaN
16      2019-08          NaN
17      2019-09     722831.0
18      2019-10          NaN
19      2019-11          NaN

For anyone interested or whomever might face a similar issue, the data was obtained from the Hong Kong Monetary Authority, using their Open API initiative. For more info visit HKMA's documentation.

Specifically, this issue arises when using the Economic Statistics dataset, which can be found in the following page of the documentation.

GACharala
  • 39
  • 8
  • 1
    Yes, what is `2014-00` supposed to mean? Becuase there is a `12` month value, therefore there are 13 possible values for each year? – Celius Stingher Dec 26 '19 at 14:47
  • How could there be both `00` and `12` in the month field? – vb_rises Dec 26 '19 at 15:00
  • Maybe `00` is the total sum of the corresponding year? Or some kind of aggregation? – Celius Stingher Dec 26 '19 at 15:02
  • The issue here is that when you see a yyyy-00 it's supposed to be a yearly frequency value of that respective year, whereas when you see a yyyy-01 it represents the monthly figure for January. – GACharala Dec 26 '19 at 15:03

1 Answers1

0

It appears that I have managed to find a solution for the issue. Here is the link where I found how to solve it: Handling multiple datetime formats with pd.to_datetime

Here is the line I used:

df['end_of_month'] = pandas.to_datetime(df['end_of_month'], format='%Y-%m',errors='coerce').fillna(pandas.to_datetime(df['end_of_month'], format='%Y-00',errors='coerce'))

It simply "fills" the coerced rows with a different format. The format I used for the yearly figures whose zero-padded zeroth month has no meaning, is : "%Y-00" since we can diregard the "-00" which has no meaning for the yearly frequency values.

GACharala
  • 39
  • 8