I'm converting an Excel spreadsheet to Python so as to automate and speed up several tasks. I need to add several columns to the DataFrame and add data to them based on values in a previous column. I've got it working using two nested for loops, but it's really slow and I know Pandas is not designed for cell-by-cell work. Here's a sample of my problem:
import pandas as pd
results = pd.DataFrame({'scores':[78.5, 91.0, 103.5], 'outcomes':[1,0,1]})
thresholds = [103.5, 98.5, 93.5, 88.5, 83.5, 78.5]
for threshold in thresholds:
results[str(threshold)] = 0
for index, row in results.iterrows():
if row['scores'] > threshold:
results.set_value(index, str(threshold), row['outcomes'])
print (results)
And the correct output:
outcomes scores 103.5 98.5 93.5 88.5 83.5 78.5
0 1 78.5 0 0 0 0 0 0
1 0 91.0 0 0 0 0 0 0
2 1 103.5 0 1 1 1 1 1
What is a more efficient way of doing this? I've been toying with the idea of transposing the DataFrame to work by columns instead of rows, but I can't get anything to work. Thanks for your help!