Loop where the condition for the first entry in a column is different from the remaining column entries?

Question

I have a dataframe named Exam that looks like

Col A      Col B      Col C     Col D     Col E     Col F
  1          1         Jan       2.5       2.5       Yes
  1          2         Jan       2.4       2.5       Yes
  2          3         Jan       2.4       2.5       Yes
  2          4         Feb       2.3       2.4       No
  2          5         Feb       2.5       2.6       No
  3          6         Mar       2.4       2.6       Yes
  3          7         Mar       2.5       2.5       Yes

I want to check the condition of Col F and store it in a new column called Col G but the condition for the first row of the dataframe/Col F is different from the remaining rows in the Col F. I have the following script:

for i in Exam.index:
    def val(df):
        if i == 0:
            if df["Col F"] == "Yes":
                return "In"
            if df["Col F"] == "No":
                return "Out"
        if i != 0:
            if df["Col F"] == "Yes":
                return "In2"
            if df["Col F"] == "No":
                return "Out2"

Exam["Col G"] = Exam.apply(val, axis=1)

Exam

The script returns:

Col A      Col B      Col C     Col D     Col E     Col F     **Col G**
  1          1         Jan       2.5       2.5       Yes       **In2**
  1          2         Jan       2.4       2.5       Yes       **In2**
  2          3         Jan       2.4       2.5       Yes       **In2**
  2          4         Feb       2.3       2.4       No        **Out2**
  2          5         Feb       2.5       2.6       No        **Out2**
  3          6         Mar       2.4       2.6       Yes       **In2**
  3          7         Mar       2.5       2.5       Yes       **In2**

but I want it to return:

Col A      Col B      Col C     Col D     Col E     Col F     **Col G**
  1          1         Jan       2.5       2.5       Yes       **In**
  1          2         Jan       2.4       2.5       Yes       **In2**
  2          3         Jan       2.4       2.5       Yes       **In2**
  2          4         Feb       2.3       2.4       No        **Out2**
  2          5         Feb       2.5       2.6       No        **Out2**
  3          6         Mar       2.4       2.6       Yes       **In2**
  3          7         Mar       2.5       2.5       Yes       **In2**

The loop isn't executing the condition for the first row in Col F. This seems like an easy thing but I'm not sure what I am doing wrong. Thanks!

Please extract and provide a [mcve], chances are you'll find the problem yourself that way. As a new user, please also take the [tour] and read [ask]. — Ulrich Eckhardt, Jan 10 '21 at 21:43
https://stackoverflow.com/questions/3431676/creating-functions-in-a-loop — SuperStormer, Jan 10 '21 at 21:45
@UlrichEckhardt except from columns A through E which are not important. This seems prety minimal to me. — Roy Cohen, Jan 10 '21 at 21:47

score 0 · Answer 1 · answered Jan 10 '21 at 22:02

The simplest sulotion would probably be to modify the val function to take i and df['Col F'] as arguments, and use enumerate(df['Col F']) as inputs to val.

def val(i, f):
    if i == 0:
        if f == "Yes":
            return "In"
        if f == "No":
            return "Out"
    if i != 0:
        if f == "Yes":
            return "In2"
        if f == "No":
            return "Out2"

Exam["Col G"] = [val(i, f) for i, f in enumerate(df['Col F'])]

If you need to use more than one column in your calculation, you can use zip:

def val(i, e, f): ...

df['Col G'] = [val(i, e, f) for i, (e, f) in enumerate(zip(df['Col E'], df['Col F']))]

Loop where the condition for the first entry in a column is different from the remaining column entries?

1 Answers1