I am trying to create a dataframe.
df = pd.DataFrame(columns=["Year", "Fuel", "Status", "Sex", "Service", "Expected"])
The other columns contain data created using np.random
.
Within the "expected" column I would like to input Pass or Fail depending on a few conditions. If the mileage is less 100000 and if the service is yes then it will pass, otherwise its a fail.
This is what I have so far
df["Expected"] = df.loc[(df['Mileage']< 100000) | (df['Service'] == 'Yes', "Pass", "Fail")]
It is bringing up the error message
ValueError: operands could not be broadcast together with shapes (500,) (3,)
I have filled the other columns with 500 lines of data. But I am not sure what the 3 relates to. Possibly the Yes, Pass, Fail values.
I also tried df['Expected'] = np.where(df ["Mileage"] < 132352, ['Service'] == "Yes",'Pass','Fail')
which kind of worked.
Am I on the wrong track?
Any help or pointers would be appreciated.