I'm trying to build a section of a program that iterates through each individual column per row and then outputs resulting logic dependent on the contents of the column of the row the program iterates on.
Dataframe:
import pandas as pd
df = pd.DataFrame({'Key': ['1234', '4321', '2341', '4132'],
'Value1': ['JFK', 'LAX', 'ATL','CLT'],
'Value2': ['NYC', 'CA', 'GA','NC'],
'Value3': ['Yes', 'No', 'No', 'No']})
For each row, I want to look at the specific key and then look at Value1
contents and output based on an if statement, then Value2
and Value3
.
The IF statement logic would look similar to:
for rows in df():
if [Value1] == 'JFK':
print('John F Kennedy')
else :
print('N/A')
if [Value2] == 'NYC':
print('New York City')
else:
print('N/A')
if [Value3] == 'Yes':
print('Able')
elif [Value3] == 'No':
print('Unable')
I would want the desired output to be in a dataframe as well that would summarize the initial key iterated along with a concatenation of all the print
statements per the above logic. Would look something like this:
result_df = pd.DataFrame({'Key': ['1234'],
'Result': ['John F Kennedy, New York City, Able']})
I've simplified the logic above to condense it but in reality the logic would have way more conditions to meet.
Any overall help would be great. Just need a kick in the right direction to have pandas iterate through the columns of a row.
Thank you!
Resolution:
Given that my question might not have been the clearest, I dug into numpy and pandas a bit more and found a solution for those who may encounter a similar issue to me.
I wanted to find a way of iterating through a dataframe and based on the contents of specific columns, create another column with results. This would essentially mimic an if
statement in excel. Several posters had discouraged using iterrows()
so I didn't go down that route. Instead i found np.select()
in numpy.
Assume the following dataframe:
data = {'ID': ['1254','4568','9547','7856'],
'Primary': [True, False, True, False],
'Secondary': [True, False, False, True]}
df = pd.DataFrame(data)
print(df)
I want the result of Tertiary (new column) to be a function of the contents of Primary and Secondary. For example, I want Tertiary to equal "Yes" when Primary is equal to True and Secondary is equal to True. Instead of iterating, I used np.select():
import pandas as pd
import numpy as np
data = {'ID': ['1254','4568','9547','7856'],
'Primary': [True, False, True, False],
'Secondary': [True, False, False, True]}
df = pd.DataFrame(data)
print(df)
primary_secondary_flag_conditions = [
(df['Primary'] == True) & (df['Secondary'] == True),
(df['Primary'] == False) & (df['Secondary'] == False),
(df['Primary'] == True) & (df['Secondary'] == False),
(df['Primary'] == False) & (df['Secondary'] == True)
]
primary_secondary_flag_values = [
"Yes",
"No",
"Maybe",
"Maybe"
]
df['Tertiary'] = np.select(primary_secondary_flag_conditions, primary_secondary_flag_values, None)
# replacing all 'None' values with empty string ""
df.fillna("",inplace=True)
print(df)
The resulting dataframe appears like so:
import pandas as pd
import numpy as np
data = {'ID': ['1254','4568','9547','7856'],
'Primary': [True, False, True, False],
'Secondary': [True, False, False, True],
'Tertiary': ["Yes", "No", "Maybe", "Maybe"]}
df = pd.DataFrame(data)
print(df)