0

I have a dataframe named Exam that looks like

Col A      Col B      Col C     Col D     Col E     Col F
  1          1         Jan       2.5       2.5       Yes
  1          2         Jan       2.4       2.5       Yes
  2          3         Jan       2.4       2.5       Yes
  2          4         Feb       2.3       2.4       No
  2          5         Feb       2.5       2.6       No
  3          6         Mar       2.4       2.6       Yes
  3          7         Mar       2.5       2.5       Yes

I want to check the condition of Col F and store it in a new column called Col G but the condition for the first row of the dataframe/Col F is different from the remaining rows in the Col F. I have the following script:

for i in Exam.index:
    def val(df):
        if i == 0:
            if df["Col F"] == "Yes":
                return "In"
            if df["Col F"] == "No":
                return "Out"
        if i != 0:
            if df["Col F"] == "Yes":
                return "In2"
            if df["Col F"] == "No":
                return "Out2"

Exam["Col G"] = Exam.apply(val, axis=1)

Exam

The script returns:

Col A      Col B      Col C     Col D     Col E     Col F     **Col G**
  1          1         Jan       2.5       2.5       Yes       **In2**
  1          2         Jan       2.4       2.5       Yes       **In2**
  2          3         Jan       2.4       2.5       Yes       **In2**
  2          4         Feb       2.3       2.4       No        **Out2**
  2          5         Feb       2.5       2.6       No        **Out2**
  3          6         Mar       2.4       2.6       Yes       **In2**
  3          7         Mar       2.5       2.5       Yes       **In2**

but I want it to return:

Col A      Col B      Col C     Col D     Col E     Col F     **Col G**
  1          1         Jan       2.5       2.5       Yes       **In**
  1          2         Jan       2.4       2.5       Yes       **In2**
  2          3         Jan       2.4       2.5       Yes       **In2**
  2          4         Feb       2.3       2.4       No        **Out2**
  2          5         Feb       2.5       2.6       No        **Out2**
  3          6         Mar       2.4       2.6       Yes       **In2**
  3          7         Mar       2.5       2.5       Yes       **In2**

The loop isn't executing the condition for the first row in Col F. This seems like an easy thing but I'm not sure what I am doing wrong. Thanks!

martineau
  • 119,623
  • 25
  • 170
  • 301
Nerd72
  • 1
  • 1

1 Answers1

0

The simplest sulotion would probably be to modify the val function to take i and df['Col F'] as arguments, and use enumerate(df['Col F']) as inputs to val.

def val(i, f):
    if i == 0:
        if f == "Yes":
            return "In"
        if f == "No":
            return "Out"
    if i != 0:
        if f == "Yes":
            return "In2"
        if f == "No":
            return "Out2"

Exam["Col G"] = [val(i, f) for i, f in enumerate(df['Col F'])]

If you need to use more than one column in your calculation, you can use zip:

def val(i, e, f): ...

df['Col G'] = [val(i, e, f) for i, (e, f) in enumerate(zip(df['Col E'], df['Col F']))]
Roy Cohen
  • 1,540
  • 1
  • 5
  • 22