1

I'm trying to build a section of a program that iterates through each individual column per row and then outputs resulting logic dependent on the contents of the column of the row the program iterates on.

Dataframe:

import pandas as pd

df = pd.DataFrame({'Key':    ['1234', '4321', '2341', '4132'], 
                   'Value1': ['JFK', 'LAX', 'ATL','CLT'],
                   'Value2': ['NYC', 'CA', 'GA','NC'],
                   'Value3': ['Yes', 'No', 'No', 'No']})

For each row, I want to look at the specific key and then look at Value1 contents and output based on an if statement, then Value2 and Value3.

The IF statement logic would look similar to:

for rows in df():
    if [Value1] == 'JFK':
        print('John F Kennedy')
    else :
        print('N/A')
    if [Value2] == 'NYC':
        print('New York City')
    else:
        print('N/A')
    if [Value3] == 'Yes':
        print('Able')
    elif [Value3] == 'No':
        print('Unable')

I would want the desired output to be in a dataframe as well that would summarize the initial key iterated along with a concatenation of all the print statements per the above logic. Would look something like this:

result_df = pd.DataFrame({'Key':    ['1234'], 
              'Result': ['John F Kennedy, New York City, Able']})

I've simplified the logic above to condense it but in reality the logic would have way more conditions to meet.

Any overall help would be great. Just need a kick in the right direction to have pandas iterate through the columns of a row.

Thank you!


Resolution:

Given that my question might not have been the clearest, I dug into numpy and pandas a bit more and found a solution for those who may encounter a similar issue to me.

I wanted to find a way of iterating through a dataframe and based on the contents of specific columns, create another column with results. This would essentially mimic an if statement in excel. Several posters had discouraged using iterrows() so I didn't go down that route. Instead i found np.select() in numpy.

Assume the following dataframe:

data = {'ID':        ['1254','4568','9547','7856'],
        'Primary':   [True, False, True, False],
        'Secondary': [True, False, False, True]}
df = pd.DataFrame(data)
print(df)

I want the result of Tertiary (new column) to be a function of the contents of Primary and Secondary. For example, I want Tertiary to equal "Yes" when Primary is equal to True and Secondary is equal to True. Instead of iterating, I used np.select():

import pandas as pd
import numpy as np
data = {'ID':        ['1254','4568','9547','7856'],
        'Primary':   [True, False, True, False],
        'Secondary': [True, False, False, True]}
df = pd.DataFrame(data)
print(df)

primary_secondary_flag_conditions = [
    (df['Primary'] == True)  & (df['Secondary'] == True),
    (df['Primary'] == False) & (df['Secondary'] == False),
    (df['Primary'] == True)  & (df['Secondary'] == False),
    (df['Primary'] == False) & (df['Secondary'] == True)
]
primary_secondary_flag_values = [
    "Yes",
    "No",
    "Maybe",
    "Maybe"
]
df['Tertiary'] = np.select(primary_secondary_flag_conditions, primary_secondary_flag_values, None)

# replacing all 'None' values with empty string ""
df.fillna("",inplace=True)

print(df)

The resulting dataframe appears like so:

import pandas as pd
import numpy as np
data = {'ID':        ['1254','4568','9547','7856'],
        'Primary':   [True, False, True, False],
        'Secondary': [True, False, False, True],
        'Tertiary':  ["Yes", "No", "Maybe", "Maybe"]}
df = pd.DataFrame(data)
print(df)
datoro
  • 59
  • 10
  • 3
    iterating on pandas dataframe is not recommended. please provide you sample input/output and explain the logic xlearly and show what you have tried? – Mohamed Thasin ah Aug 05 '22 at 15:27
  • If you truly need to iterate, you can use [dataframe.iterrows()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html) or [dataframe.apply()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html), but for more specific answers please [edit] to include a [mcve] with sample input and expected output as text in the body of the question, not as a picture or link – G. Anderson Aug 05 '22 at 15:29
  • `.apply()` > `iterrows()`, but either should only be used if iteration is the only possible method. – BeRT2me Aug 05 '22 at 15:33
  • 1
    Please see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and post your data as text, NOT an image. Include example desired output and logic~ – BeRT2me Aug 05 '22 at 15:35
  • Edited. I appreciate the help. Not an expert with pandas by any means so looking for a way to get the final result even if it's at least printed on the terminal. I'd work o put it into a CSV later on but not he priority right now. Thanks again! – datoro Aug 05 '22 at 16:04
  • I've closed the question because it looks to me that the "complex logic" boils down to replacing a string with another string in a cell (and perhaps producing a CSV out of a dataframe). If there is more to it, feel free to include additional details. – norok2 Aug 05 '22 at 16:21
  • I'm not replacing the string with another string hence the need to iterate through the columns. I'm looking at all columns of a row, depending on what the cells contain, there is logic to apply, then the final result would be to compile all those results into a dataframe with 2 columns: column A key, column B result of logic. I'm not sure how else to explain it but the suggested answer is not what I'm looking for. – datoro Aug 05 '22 at 16:24
  • is it that each cell depend on the value of an entire column? or on the value of some other cell from another row? The example you gave seem to point to each row as being independent. At worst, you need to create a new column combining values/replacement from various cells but within the same column. If that is the case, the answers in those question have you covered. If that is not the case, perhaps you should provide a more articulated example? – norok2 Aug 05 '22 at 16:33
  • Not sure how I would clarify it further. I'll just scour other pages. Thanks anyway. – datoro Aug 05 '22 at 16:35
  • You may be running into an [XY-problem](https://en.wikipedia.org/wiki/XY_problem). You want to iterate because you think this is what should solve your problem. However, going from your input df to your output df requires no explicit iteration (and it would not be the recommended approach in pandas anyway). And the means of achieving this are contained in the answers to those questions I marked as duplicate. If you feel you still cannot do this, perhaps a question asking explicitly for that is in order. If you feel the example input/output do not represent your problem, please change those. – norok2 Aug 05 '22 at 16:44
  • @norok2 - If you could open the question, I can post what I did to solve. I ended up using conditional arguments with `np.select()`. This way, the code looks at specific columns in the CSV and fitting to the conditions, code outputs values. – datoro Aug 09 '22 at 13:57
  • @datoro I recommend writing in the question as a tentative solution. Perhaps it helps others better understand what you are after and it will allow other users to suggest better alternatives. – norok2 Aug 09 '22 at 14:08

0 Answers0