I have a question regarding how to perform what would be the equivalent of returning a value using the INDEX MATCH functions in Excel and applying it in Python.
As an Excel user performing data analytics and manipulation on large data-sets I have moved to Python for efficiency. What I am attempting to do is to populate the column cells within a pandas dataframe based on the value returned from the value stored within a dictionary.
In an attempt to do this I have used the following code:
# imported csv DataFrames
crew_data = pd.read_csv(r'C:\file_path\crew_data.csv')
export_template = pd.read_csv(r'C:\file_path\export_template.csv')
#contract number dictionary
contract = {'Northern':'046-2019',
'Southern':'048-2015D',}
#function that attempts to perform a INDEX MATCH equivalent
def contract_num():
for x, y in enumerate(crew_data.loc[:, 'Region']):
if y in contract.keys():
num = contract[y]
else:
print('ERROR')
return(num)
#for loop which prepares then exports the load data
for i, r in enumerate(export_template):
export_template.loc[:, 'Contract'] = contract_num()
export_template.to_csv(r'C:\file_path\export_files\UPLOADER.csv')
print(export_template)
To summarise what the code is intended to do is as follows:
- The for loop contained in the contract_num function begins by iterating over the Region column in the crew_data DataFrame
- if the value y from the DataFrame matches the key in the contract dictionary (Note: the Region column only contains 2 values, 'Southern' and 'Northern') it will return the corresponding value from the value in the contract dictionary
- The for loop which prepares then exports the load data calls on the contract_num() function to populate the Contract column in the export_template DataFrame
Please note that there are 116 additional columns which are populated in this loop which have been excluded from the code above to save space.
When the code is executed it produces the result as intended, however, the issue is that when the function is called in the second for loop it only returns a single value of 048-2015D instead of the value which corresponds to the correct Region.
As mentioned previously this would have typically been carried out in Excel using INDEX MATCH, however doing so is not as efficient as using a script such as that above.
Being a beginner, I suspect the example code may appear con-deluded and unnecessary and could be performed using a more concise method.
If anyone could provide a solution or guidance that would be greatly appreciated.