I have a df that is 2,000 lines long, it is taking me about 5 minutes to map 2,000 IDs to the corresponding names, which is way too long. I'm trying to figure out a way to reduce the mapping time. One option I want to try is mapping the IDs as I make the dictionary- and not storing more than one line at a time in the dictionary.
Here is the process I'm using:
df_IDs=
studentID grade age
123 12th 18
432 11th 17
421 11th 16
And I want to replace the 'studentID' column with the student names that I can get from a mapping file (student_directory_df).
I created functions that will make a dictionary and map the IDs to the names:
dicts={}
def search_tool_student(df, column, accession):
index=np.where(df[column].str.contains(accession, na=False))
if len(index[0])==0:
done=""
pass
else:
done=df.iloc[index]
return(done)
def create_dict(x):
result_df = search_tool_student(student_directory_df, 'studentID', x)
if (len(result_df) == 0):
print('bad match found: '+ x)
else:
student_name = result_df['name'].iloc[0]
dicts[x] = student_name
return(dicts)
def map_ID(df_IDs):
studentIDs=df_IDs['studentID']
new_dict=list(map(create_dict, studentIDs))[0]
df_IDs['studentID']=df_IDs['studentID'].map(new_dict).fillna(df_IDs['studentID'])
return(df_IDs)
desired output
studentID grade age
sally 12th 18
joe 11th 17
sarah 11th 16