some 'minor' improvements that you could do is to use f-strings for your string labeling e.x. token+'STR_WT'
becomes f{token}STR_WT
.
you can remove the token.split('_')[0]
with token[:-1]
as using dot operators are generally slower.
column_names = ["Geoid", "Occupancy", "BDG_Weights", "CTS_Weights", "BI_Weights"] # storing once and using this value instead of creating inside the for loops
n_df = pd.DataFrame(columns=column_names)
# adding variables that will be the same inside the inner for loop so that you are not grabbing the variable with f-strings each time
token_names = [[token[:-1], f'{token}STR_WT', f'{token}CNT_WT', f'{token}BI_WT'] for token in ['SFD_', 'MFD_', 'COM_', 'IND_']] # so all of the strings are created at once and can be referenced
for idx, row in dd.iterrows():
for token in token_names:
n_df = pd.concat([n_df, pd.DataFrame([[idx, token[0], row[token[1]], row[token[2]], row[token[2]]]], columns=column_names)])
break
you could also store pd.concat
and pd.DataFrame
as local variables before the for loop to remove the constant lookup of functions.
Though these are relatively small tweaks. The largest tweaks would be to go through why you need to do these steps in this order, are there things that you can filter out so that you don't have to go through all of these entries?
Unsure completely but it seems that what you are trying to do is to concat entire columns (those with token names) into a singular dataframe. It might be worth checking out https://sparkbyexamples.com/pandas/pandas-create-new-dataframe-by-selecting-specific-columns/ to select all of the columns of a specific token at once and concat all of those token values into one dataframe together.