I have two pandas dataframes. df_lst contains a list of column names and expected value, and df has a series of data.
The column names in df_lst may change and I use the following script to look up the column index from df that aligns to the column name in df_lst Showing this code incase it is an extra step that might not be needed.
ind_dict = dict((k,i) for i,k in enumerate(d.columns))
inter = set(df_lst['Col_Name']).intersection(df)
df_lst['Index'] = [ ind_dict[x] for x in inter ]
The input for this task would look like this:
import random
import numpy as np
import pandas as pd
a = np.random.randint(12, size=(7, 11))
df = pd.DataFrame(a, ['foo','foo','bar', 'bar', 'bar', 'foo', 'foo'], ['a','b','f','g','h','j' ,'k', 'r', 's', 't', 'z'])
df_lst = pd.DataFrame({'Col_Name': ['Col_g', 'Col_j', 'Col_r', 'Col_s'],
'Expected Value': [100, 90, 122, 111],
'Index': [4, 6, 8, 9]})
How can I use the new Index values to look at that corresponding column in df and sum the values and return both the summed value and a 'True' if greater than or 'False' if less than for each row in df_lst
df_out = pd.DataFrame({'Col_Name': ['Col_g', 'Col_j', 'Col_r', 'Col_s'],
'Expected Value': [100, 90, 122, 111],
'Index': [4, 6, 8, 9],
'Sum of Col': ['sum of col_g', 'sum of col_j', 'sum of col_r', 'sum of col_s'],
'Bool': ['True or False', 'True or False', 'True or False', 'True or False']
})
Eventually this True/False data will be part of a while loop that checks something like "while 1 or more is false do X"