-2

I have a dataframe with temperature data for a certain period. With this data, I want to calculate the relative frequency of the month of August being warmer than 20° as well as January being colder than 2°. I have already managed to extract these two columns in a separate dataframe, to get the count of each temperature event and used the normalize function to get the frequency for each value in percent (see code).

df_temp1[df_temp1.aug >=20]
df_temp1[df_temp1.jan <= 2]

df_temp1['aug'].value_counts()
df_temp1['jan'].value_counts()

df_temp1['aug'].value_counts(normalize=True)*100
df_temp1['jan'].value_counts(normalize=True)*100

What I haven't managed is to calculate the relative frequency for aug>=20, jan<=2, as well as aug>=20 AND jan<=2 and aug>=20 OR jan<=2. Maybe someone could help me with this problem. Thanks.

  • text `did not get a satisfying result.` is totally useless for us. I don't understand what you want to do. Better show minimal working code - with some example data in code - and what you get and what you expect. – furas Oct 31 '21 at 10:41
  • and put all information in question, not in comment - they will be more readable (because you can't format code in comment) and more people will see it so more people may help you. – furas Oct 31 '21 at 15:26
  • Welcome to stackoverflow. Here is a piece of CONSTRUCTIVE criticism. Do the following thing. Do ```print(df.head(40))``` and paste the result in your question. Don't forget to put it between ``` ``` so it becomes readable. Paste code you've tried in the same way. Your first experience with SO should be a pleasant one. It is also stated that we should be kind to newbies. – Serge de Gosson de Varennes Oct 31 '21 at 16:51

1 Answers1

0

I would try something like this:

proprortion_of_augusts_above_20 = (df_temp1['aug'] >= 20).mean()
proprortion_of_januaries_below_20 = (df_temp1['jan'] <= 2).mean()

This calculates it in two steps. First, df_temp1['aug'] >= 20 creates a boolean array, with True representing months above 20, and False representing months which are not.

Then, mean() reinterprets True and False as 1 and 0. The average of this is the percentage of months which fulfill the criteria, divided by 100.

As an aside, I would recommend posting your data in a question, which allows people answering to check whether their solution works.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66