0

I am trying to compare the values of a particular column with one of columns, and store the result to a new column based upon their comparison index, say: Low if value differ by more than 10%, and OK if otherwise.

df["Index"] = ""
def function(df):
    for i in range(1, len(df.columns)-2):
        if((df.columns.values[1]) == (df.columns.values[i+1])):
            if((df.iloc[:,1]) < (0.9 * df.iloc[:,i+1])):
                df["Index"] = "Low"
            else:
                df["Index"] = "OK"
function(df)

What is the relation of

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

to this? Would be great if someone can as well suggest ways to reduce the time complexity using same code structure.

  • See `Series.shift` and `np.where` – ansev Jan 03 '20 at 11:49
  • Do you mind to provide a [mcve](/help/mcve)? – rpanai Jan 03 '20 at 11:52
  • 1
    `df['Index'] = np.where ( (df.iloc[:,1]) < (0.9 * df.iloc[:,i+1]),'Low','Ok')`.... you could show an example of your dataframe and your expected output – ansev Jan 03 '20 at 11:52
  • can u show an example expected input and output.? – Nayana Madhu Jan 03 '20 at 11:53
  • The data is in following structure: `df = {'char':['A', 'B', 'C', 'D'],'Age':[20, 21, 19, 18],'Age1':[29, 27, 25, 26],'Age2':[60, 48, 55, 62], 'Age3':[60, 48, 55, 62],'Age4':[60, 48, 55, 62],'Age5':[18, 19, 17, 12]}` Here, Age gets compared with one of the Age group, and if the values in Age is, say 90% less than that of Age1, df[Index] is assigned as Low, and if Age is 110% higher than Age1, Index is assigned as High, else Average. Age is the fixed column `df.iloc[:,1]`. –  Jan 03 '20 at 12:03
  • @ansev it returned the values of `df.iloc[:,i+1]` to index. –  Jan 03 '20 at 12:05
  • @PukarAcharya do you want to check `Age` against any of the columns after it? And if one of them is 90% less then result in 'Ok'? – davidbilla Jan 03 '20 at 12:20
  • @davidbilla Yes. If it is more by 1.1 times, then index is high. If it's less by 0.9 times, then the index is low, and OK if otherwise. –  Jan 03 '20 at 12:24
  • 1
    @PukarAcharya check my answer.. this is close to ansev's – davidbilla Jan 03 '20 at 12:26

2 Answers2

0

Do you want to check Age against any of the columns after it? And if one of them is 90% less then result in 'Ok'? I am not sure if the logic you need is this?

df = pd.DataFrame({'char':['A', 'B', 'C', 'D'],'Age':[20, 21, 19, 18],'Age1':[29, 27, 25, 26],'Age2':[60, 48, 55, 62], 'Age3':[60, 48, 55, 62],'Age4':[60, 48, 55, 62],'Age5':[18, 19, 17, 12]})

df["Index"] = ""

def function(df):
    for i in range(1, len(df.columns)-2):
        df['Index'] = np.where(df.iloc[:, 1] < 0.9 * df.iloc[:, i + 1], 'Low', 'Ok')


function(df)

Output:

  char  Age  Age1  Age2  Age3  Age4  Age5 Index
0    A   20    29    60    60    60    18    Ok
1    B   21    27    48    48    48    19    Ok
2    C   19    25    55    55    55    17    Ok
3    D   18    26    62    62    62    12    Ok

If I change one of the Age values to 1 so that the Age[i+1] columns are more than 90% then result is

  char  Age  Age1  Age2  Age3  Age4  Age5 Index
0    A    1    29    60    60    60    18   Low
1    B   21    27    48    48    48    19    Ok
2    C   19    25    55    55    55    17    Ok
3    D   18    26    62    62    62    12    Ok
davidbilla
  • 2,120
  • 1
  • 15
  • 26
  • `df['Index']= [np.where(df.iloc[:, 1] < 0.9 * df.iloc[:, i + 1], 'Low', 'Ok') for i in range(1, len(df.columns)-2)]`. – ansev Jan 03 '20 at 12:32
  • 1
    `df['Index'] = ''` ...not neccesary – ansev Jan 03 '20 at 12:33
  • 1
    isn't this the same as `df['Age'] < (df[column-1] * 0.9) ?` feels like a strange way to loop when the last value will be the absoloute..? – Umar.H Jan 03 '20 at 12:41
  • Yes, I wasn't sure on the logic in the question, so I just did a edit to the code snippet in the question to ask if what I was thinking was right? We can remove the loop and be more pythonic, absolutely. I still don't understand the logic behind this, as it is always going to use the last column. – davidbilla Jan 03 '20 at 12:50
0

I'm not quite sure about the logic behind your question but if there are many columns you could consider to use something like

import pandas as pd
df = pd.DataFrame({'char':['A', 'B', 'C', 'D'],
                   'Age':[20, 21, 19, 18],
                   'Age1':[29, 27, 25, 26],
                   'Age2':[60, 48, 55, 62], 
                   'Age3':[60, 48, 55, 62],
                   'Age4':[60, 48, 55, 62],
                   'Age5':[18, 19, 17, 12]})

cols2compare = df.columns[df.columns.str.startswith("Age")]

diz = {True:"Ok", False:"Low"}

df["Index"] = df[cols2compare].apply(lambda x: x["Age"] < x.max()*.9, 
                                     axis=1).map(diz)

rpanai
  • 12,515
  • 2
  • 42
  • 64