error running python script for data wrangling

Question

I am a new python user and I am trying to write a script to perform some data wrangling activities. It will be used to fetch .csv files and return some necessary outputs. The script is shown thus:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%d/%m/%Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = ddfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%d/%m/%Y"))
df.to_csv('gas-details_month.csv', index=True)

Then I try to run the script with the command python3 hello.py then the error shows thus:

Traceback (most recent call last):
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Day'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "hello.py", line 6, in <module>
    dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'Day'

Please help, I'd really appreciate this.

Recheck your Data frame as there is no column name "Day" ....maybe its the column name is "day" — Anurag Dabas, Feb 20 '21 at 07:32
I checked the dataset here : https://github.com/Levantado/henry_hub_natural_gas_spot_price-/blob/master/IN_DATA/Henry_Hub_Natural_Gas_Spot_Price.csv , is your dataset same? Check line 5 in your dataset, if it has a column name `Day` — Rishabh Kumar, Feb 20 '21 at 07:40
@RishabhKumar The dataset is not the same, here is my dataset https://www.eia.gov/dnav/ng/hist/rngwhhdm.htm — Arogbonlo Samuel, Feb 20 '21 at 09:00

Rishabh Kumar · Accepted Answer · 2021-02-20T09:53:10.830

0

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Month"],format='%b %Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%d/%m/%Y"))
df.to_csv('gas-details_month.csv', index=True)

Based on your data this code will work.

In your data set "Month" column is present instead of "Day" which raised the KeyError.

Also the time is of format is "%b %Y" in your dataset, so changed that part too.

Dataset link provided by OP : https://www.eia.gov/dnav/ng/hist/rngwhhdm.htm

Output: Generates 2 csv files:

1 ) gas-details_day.csv

Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
Jan 2021,2.71
Dec 2020,2.59
Nov 2020,2.61
Oct 2020,2.39
Sep 2020,1.92
Aug 2020,2.3
...

2 ) gas-details_month.csv

Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
31/01/1997,3.45
28/02/1997,2.15
31/03/1997,1.89
30/04/1997,2.03
31/05/1997,2.25
30/06/1997,2.2
...

edited Feb 20 '21 at 09:53

answered Feb 20 '21 at 09:12

Rishabh Kumar

2,342
3
13
23

It is similar to the previous one. I'll add it as an answer so you'd see it – Arogbonlo Samuel Feb 20 '21 at 09:52
Can you see it? – Arogbonlo Samuel Feb 20 '21 at 09:53
fixed my answer. There was a typo. Check now – Rishabh Kumar Feb 20 '21 at 09:54
Oh wow, awesome. It worked. What typo did you correct? – Arogbonlo Samuel Feb 20 '21 at 09:55
I removed a forwardslash in "%b %Y". You can see this in edit history of my answer too. – Rishabh Kumar Feb 20 '21 at 09:57
Awesome. Is there a way to automate the retrieval of data from the site instead of always downloading the data pdf and uploading in the python script directory? – Arogbonlo Samuel Feb 20 '21 at 09:59
Yes it is totally possible, this is actually off-topic, I shouldn't be answering here. You can check this link to learn more : https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url – Rishabh Kumar Feb 20 '21 at 10:03

error running python script for data wrangling

1 Answers1