0

I am currently trying to merge two csv files using the following code:

import pandas as pd
data1 = pd.read_csv("QLD.csv") 
data2 = pd.read_csv("VIC.csv")
result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP']], data2[['REGION', 'TOTALDEMAND', 'RRP']], on='SETTLEMENTDATE')
result.to_csv("masterfile.csv", index=False)

This is what the head of each of my csv files look like: QLD VIC

When I run my code I receive this error:

Traceback (most recent call last):
  File "/Users/george/Desktop/collate/asdas.py", line 4, in <module>
    result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP']], data2[['REGION', 'TOTALDEMAND', 'RRP']], on='SETTLEMENTDATE')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 61, in merge
    validate=validate)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 551, in __init__
    self.join_names) = self._get_merge_keys()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 857, in _get_merge_keys
    rk, stacklevel=stacklevel))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 1382, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'SETTLEMENTDATE'

Any idea what's going wrong? Thanks

edit1: settlement data centre

user8261831
  • 464
  • 1
  • 4
  • 20

2 Answers2

1

You wish to merge on 'SETTLEMENTDATE', but this series is excluded from your input dataframes. Hence Pandas raises KeyError. Try instead:

result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP', 'SETTLEMENTDATE']],
                  data2[['REGION', 'TOTALDEMAND', 'RRP', 'SETTLEMENTDATE']],
                  on='SETTLEMENTDATE')
jpp
  • 159,742
  • 34
  • 281
  • 339
  • That makes sense thank you! – user8261831 Jan 20 '19 at 03:55
  • One follow up question, is there a way to have settlement to to the left of all my data, currently it is in the centre. I have edited my original post to illustrate this – user8261831 Jan 20 '19 at 03:57
  • You can use [Move column by name to front of table in pandas](https://stackoverflow.com/questions/25122099/move-column-by-name-to-front-of-table-in-pandas) after your merge. – jpp Jan 20 '19 at 03:58
1

Try:

result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP','SETTLEMENTDATE']], data2[['REGION', 'TOTALDEMAND', 'RRP','SETTLEMENTDATE']], on='SETTLEMENTDATE')

You are merging on SETTLEMENTDATE but not taking the same in the slice of the dataframe you are merging on.

anky
  • 74,114
  • 11
  • 41
  • 70