0

I am trying to create a dataframe.

df = pd.DataFrame(columns=["Year", "Fuel", "Status", "Sex", "Service", "Expected"])

The other columns contain data created using np.random.

Within the "expected" column I would like to input Pass or Fail depending on a few conditions. If the mileage is less 100000 and if the service is yes then it will pass, otherwise its a fail.

This is what I have so far

df["Expected"]  = df.loc[(df['Mileage']< 100000) | (df['Service'] == 'Yes', "Pass", "Fail")]

It is bringing up the error message

ValueError: operands could not be broadcast together with shapes (500,) (3,) 

I have filled the other columns with 500 lines of data. But I am not sure what the 3 relates to. Possibly the Yes, Pass, Fail values.

I also tried df['Expected'] = np.where(df ["Mileage"] < 132352, ['Service'] == "Yes",'Pass','Fail') which kind of worked.

Am I on the wrong track?

Any help or pointers would be appreciated.

CDJB
  • 14,043
  • 5
  • 29
  • 55
bexi
  • 105
  • 7

2 Answers2

1

I'd create a function that takes a pd.Series object as the only argument, and then returns the value for that cell. Then use pd.apply(lambda row: your_function(row), axis=1). So:

def your_function(row):
    if row["Mileage"] <132352 and row["Service"] == "Yes" :# fill in your other conditions here
        return "Pass"
    else:
        return "Fail"

df["Expected"] = df.apply(lambda row: your_function(row), axis=1)
user1558604
  • 947
  • 6
  • 20
  • At the last line pd.apply creates the" AttributeError: module 'pandas' has no attribute 'apply"' so I changed it to df.apply which then creates "ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 1')". I then used `def fun(row): if row["Mileage"] <100000 and df['Service'].any(): return "Pass" else: return "Fail" df["Expected"] = df.apply(lambda row: fun(row), axis=1)` Thank you. – bexi Dec 10 '19 at 14:57
  • Yes, sorry. it should be `df.apply`. I'm confused on what you are doing with the `.any()`. Could you explain that a bit more? – user1558604 Dec 10 '19 at 15:00
  • would it make sense to do `df["service"] = "Yes"` – user1558604 Dec 10 '19 at 15:02
  • If I use just `def fun(row): if row["Mileage"] <100000 and df['Service']:` I get the error message ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 2'). So I put in .any(): after I looked up https://stackoverflow.com/questions/53830081/python-pandas-the-truth-value-of-a-series-is-ambiguous and came across You are comparing two pd.Series, or a pd.Series with a value, so you might have multiple True and multiple False values, you have to do instead:if (data == ask_minute['lastUpdated']).any() – bexi Dec 10 '19 at 15:03
  • oh, sorry. Use `row["Service"]`. `row` in your function is a pd.Series of one row of your df. – user1558604 Dec 10 '19 at 15:29
  • 1
    Works perfectly. Thank you for all the time you put into this. – bexi Dec 10 '19 at 17:11
1

You could simply fill the Expected column with 'Fail':

df['Expected'] = 'Fail'

And then:

df.at[df[(df['Mileage']<100000) & (df['Service'] == 'Yes')].index,'Expected'] = 'Pass'