1

Really new to python and need a bit of help with a question I have to complete.

I need to find the average earned per time period (month/year), based on a user input for month (MM) and year (YYYY).

I have the inputs as follows:

year_value = int(input("Year (YYYY): "))
month_value = int(input("Month (MM): "))

My dataframe looks like this:

Race_Course   Horse Name      Year   Month   Day   Amount_won_lost   Won/Lost
Aintree       Red Rum         2017   5       12    11.58             won
Punchestown   Camelot         2016   12      22    122.52            won
Sandown       Beef of Salmon  2016   11      17    20.0              lost
Ayr           Corbiere        2016   11      3     25.0              lost
Fairyhouse    Red Rum         2016   12      2     65.75             won
Ayr           Camelot         2017   3       11    12.05             won
Aintree       Hurricane Fly   2017   5       12    11.58             won
Punchestown   Beef or Salmon  2016   12      22    112.52            won
Sandown       Aldaniti        2016   11      17    10.0              lost
etc.

I have two problems:

  1. how do I group the data together based on the inputs and sum the Amount_won_lost values that match, and
  2. how do I make sure that when summing the values together that the value in Amount_won_lost is negative when Won/Lost = lost and stays positive for when Won/Lost = won

Any help would be very much appreciated! I've been stuck on this for a few hours and can't seem to figure it out.

The output should look something like this, but anything that prints the result would be perfect, I don't mind how it looks:

Year    Month    Amount_won_lost
2016    11       €-55.00
awhelton
  • 15
  • 5

2 Answers2

0

please try this

by_year = race_data[race_data['Year']==year_value ]
by_month = by_year[by_year['Month']==year_value ]
print(by_month['Amount_won_lost'].sum())

i hope it helps

ps : data is a pandas DataFrame

nassim
  • 1,547
  • 1
  • 14
  • 26
  • Hi nassim, I tried this but I got this error: `File "assignment1.py", line 91, in average_amount_spent_per_period by_year = data[data['Year']==year_value ] NameError: name 'data' is not defined` Do you know how to define data? I have imported pandas but don't know what else to do – awhelton Apr 21 '19 at 20:41
  • data is a dataframe that contains your dataset, can you please give me the name of the file you are loading your dataset from ? so that i update the answer accordingly – nassim Apr 21 '19 at 20:44
  • I load the data using this: `race_data = pd.read_csv('edit_results.csv')` – awhelton Apr 21 '19 at 20:45
  • hi nassim, I tried that but got a different error and I don't know what it means. This is the error, `FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison / result = method(y) / 0.0` . Do you know what this means? – awhelton Apr 21 '19 at 21:11
  • This is a numpy bug.In your case it could be triggered by pandas in some comparaison. https://stackoverflow.com/questions/40659212/futurewarning-elementwise-comparison-failed-returning-scalar-but-in-the-futur some one posted a realy good and detailed answer about this bug, you should read it – nassim Apr 21 '19 at 21:18
0

You can first change the signs of your Amount_won_lost attribute by using pd.DataFrame.apply()

So for the following line:

df["Amount_won_lost"] = df.apply(lambda x: -x["Amount_won_lost"] \
    if x["Won/Lost"] == "lost" else x["Amount_won_lost"], axis = 1)

It will replace your Amount_won_lost column with either a positive or negative value depending on if they won or lost.

And by utilizing sum() function mentioned in the other comments, you can get the sum for the Amount_won_lost for that given year and month.

The following will select all the values at your inputted values:

df[(df["Year"] == year_value) & (df["Month"] == month_value)]

The output would be this:

  Race_Course       HorseName  Year  Month  Day  Amount_won_lost Won/Lost
2     Sandown  Beef of Salmon  2016     11   17            -20.0     lost
3         Ayr        Corbiere  2016     11    3            -25.0     lost
8     Sandown        Aldaniti  2016     11   17            -10.0     lost

print(df[(df["Year"] == year_value) & (df["Month"] == month_value)]["Amount_won_lost"].sum())

will print -55.0.

If you had wanted to see the sums for every given month in a year, without using user inputs, the groupby function is your best bet!

Karm
  • 76
  • 4
  • Thank you for such a detailed answer Karm! I entered the four lines of code that you have above but got an error saying that 'year' wasn't defined in the last line of code (the print function). I swapped in `year_value` for `year`, and got the following error: `FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison result = method(y) assignment1.py:103: UserWarning: Boolean Series key will be reindexed to match DataFrame index.`. Do you know how I would fix this? Sorry I'm really not sure how to fix this! – awhelton Apr 21 '19 at 21:09
  • @awhelton did you also change month to month_value? – Karm Apr 21 '19 at 21:34
  • @awhelton You will also have to assign ```df[df["Year"] == year_value][df["Month"] == month_value]``` to it's own variable such as ```df2```. Otherwise, it will perform the actions but not store the result anywhere. You should then run the following: ```print(df2[df2["Year"] == year][df2["Month"] == month]["Amount_won_lost"].sum())```. – Karm Apr 21 '19 at 21:39
  • how do I assign `df[df["Year"] == year_value][df["Month"] == month_value]` to its own variable do I write `df2 = df[df["Year"] == year_value][df["Month"] == month_value]` ? – awhelton Apr 21 '19 at 21:50
  • @awhelton yup! :) – Karm Apr 21 '19 at 21:52
  • I've tried that and I'm getting 0.0 as a result for all of the inputs - will I copy in all the code I have so you could check it? – awhelton Apr 21 '19 at 22:01
  • @awhelton sure! – Karm Apr 21 '19 at 22:03
  • `df = pd.read_csv('edit_results.csv') / year_value = input("Year (YYYY): ") / month_value = input("Month (MM): ") / df = pd.read_csv('edit_results.csv') / df["Amount_won_lost"] = df.apply(lambda x: -x["Amount_won_lost"] / if x["Won/Lost"] == "lost" else x["Amount_won_lost"], axis = 1) / df2 = df[df["Year"] == year_value][df["Month"] == month_value] / df3 = df2[df2["Year"] == year_value][df2["Month"] == month_value]["Amount_won_lost"].sum() / print(df3)` . I've separated each line with /. Thanks Karm :) – awhelton Apr 21 '19 at 22:08
  • @awhelton Your ```df3 = df2[df2["Year"] == year_value][df2["Month"] == month_value]["Amount_won_lost"].sum()``` needs to modified to ```df3 = df2["Amount_won_lost"].sum()```. I gave you the wrong advice earlier. Since you assigned df2 to it's own variable, you no longer needed to do the transforms again. Apologies. Let me know if this works! – Karm Apr 21 '19 at 22:20
  • No don't worry at all Karm, thanks for your help. I've made that change and this is what I get: `assignment1.py:101: UserWarning: Boolean Series key will be reindexed to match DataFrame index. / df2 = df[df["Year"] == year_value][df["Month"] == month_value] / 0.0 ` So still not working unfortunately. I'll try a few more things. Thanks for your help! – awhelton Apr 21 '19 at 22:34
  • @awhelton swap the line ```df2 = df[df["Year"] == year_value][df["Month"] == month_value``` for ```df2 = df[(df["Year"] == year_value) & (df["Month"] == month_value)]``` I am not getting any errors with this. This again seems to be an error in my suggestion, Apologies for making this way harder than it needed to be. – Karm Apr 21 '19 at 22:44
  • okay no more errors! Still getting 0.0 as the output for every input though :( – awhelton Apr 21 '19 at 22:47
  • @awhelton Not sure then honestly. This is the code I have and it is outputting for the 2016, 11 case. ```df["Amount_won_lost"] = df.apply(lambda x: -x["Amount_won_lost"] if x["Won/Lost"] == "lost" else x["Amount_won_lost"], axis = 1) \ print(df[(df["Year"] == year_value) & (df["Month"] == month_value)]["Amount_won_lost"].sum())``` Sorry I couldn't figure it out, it must be some simple error that you should be able to get after troubleshooting some more – Karm Apr 21 '19 at 22:52
  • was looking over everything again and I found the problem! I needed to change the inputs to integers because of a conflict between python and numpys. Changed the inputs to `year_value = int(input("Year (YYYY): ")) month_value = int(input("Month (MM): "))`. Thanks for your help! – awhelton Apr 24 '19 at 09:19
  • @awhelton Great to hear! Sorry I didn't catch that. Yes, when you use input to get a value entered from the user, it will be a String, hence why you need to cast it to an Integer using ```int()```. Happy to see you got it working! – Karm Apr 25 '19 at 01:59