1

I am a new python user and I am trying to write a script to perform some data wrangling activities. It will be used to fetch .csv files and return some necessary outputs. The script is shown thus:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%d/%m/%Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = ddfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%d/%m/%Y"))
df.to_csv('gas-details_month.csv', index=True)

Then I try to run the script with the command python3 hello.py then the error shows thus:

Traceback (most recent call last):
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Day'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "hello.py", line 6, in <module>
    dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'Day'

Please help, I'd really appreciate this.

1 Answers1

0
import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Month"],format='%b %Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%d/%m/%Y"))
df.to_csv('gas-details_month.csv', index=True)

Based on your data this code will work.

In your data set "Month" column is present instead of "Day" which raised the KeyError.

Also the time is of format is "%b %Y" in your dataset, so changed that part too.

Dataset link provided by OP : https://www.eia.gov/dnav/ng/hist/rngwhhdm.htm

Output: Generates 2 csv files:

1 ) gas-details_day.csv

Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
Jan 2021,2.71
Dec 2020,2.59
Nov 2020,2.61
Oct 2020,2.39
Sep 2020,1.92
Aug 2020,2.3
...

2 ) gas-details_month.csv

Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
31/01/1997,3.45
28/02/1997,2.15
31/03/1997,1.89
30/04/1997,2.03
31/05/1997,2.25
30/06/1997,2.2
...
Rishabh Kumar
  • 2,342
  • 3
  • 13
  • 23