Given, I have two dataframes. The first is a dataframe with 22 columns (df1
). This holds the values for all variables. The second (good_models
) is a dataframe which holds the columns of interest that I want to extract from df1
. For every row in the good_models
, I need to use the values from (var1-var10
and target
) and keep only these columns from the df1
. Rename the columns (var1-var10
and target
) then append to a new dataframe called model_data_long
I can perform this task using a for loop, however it is very slow and I am hoping there is a more efficient way to perform this.
Potentially I can reference this, however am unsure on how to apply it. Stack Overflow
import numpy as np
import pandas as pd
df1=pd.DataFrame(np.random.randint(0,100,size=(100,20+1)),columns=list(range(0,20+1)))
df1['target']=np.random.randint(2,size=100)
### This needs to be the columns in model_data_long
labels=['var1','var2','var3','var4','var5','var6','var7','var8','var9','var10','target']
### Contains the columns I want to exctract from df1 and append to model_data_long
good_models=pd.DataFrame.from_records([(0,1,2,3,4,5,6,7,8,9,'target'),
(9,8,7,6,5,4,3,2,1,0,'target'),
(20,19,18,17,16,15,14,13,12,11,'target')],columns=labels)
### works but is slow
model_data_long=pd.DataFrame()
for i in range(0,len(good_models)):
### Extracting the values for a record from good_models
t_list=good_models[good_models.index==i].values.tolist()[0]
### Keeping only the columns from t_list from the df1 frame.
temp_data=pd.DataFrame(data=df1.filter(items=t_list,axis=1))
### renaming the columns in temp_data
temp_data.columns=[labels]
### It is imparative that I have an index variable in the model_data_long dataframe.
### Setting the model_index variable, critical.
temp_data['model_index']=i
### Finally, append to a long running dataframe.
model_data_long=model_data_long.append([temp_data],ignore_index=True)