1

I am currently trying to improve my python as I have a very good grip on actual data analysis but I am trying to start creating functions which other people can run to return results and the code will also give out informative messages to the user. Below is a simple dataset I am using print the top 3 "Weather" descriptions for each city, but as you can see for Los Angeles it only has the one description.

          City        Weather
0      New York          Sunny
1      New York           Rain
2      New York         cloudy
3      New York           Rain
4      New York          Sunny
5      New York          Sunny
6      New York  partly cloudy
7      New York   thunderstorm
8      New York           Rain
9      New York         cloudy
10     New York          sunny
11     New York  partly cloudy
12     New York  partly cloudy
13     New York         cloudy
14     New York          sunny
15     New York          sunny
16     New York           rain
17       Austin           rain
18       Austin           rain
19       Austin         cloudy
20       Austin          sunny
21       Austin           rain
22       Austin  partly cloudy
23       Austin  partly cloudy
24       Austin  partly cloudy
25       Austin          Sunny
26       Austin         cloudy
27       Austin          Sunny
28       Austin          Sunny
29       Austin         cloudy
30       Austin         cloudy
31       Austin  partly cloudy
32       Austin  partly cloudy
33       Austin          Sunny
34       Austin           rain
35  Los Angeles          Sunny
36  Los Angeles          Sunny
37  Los Angeles          Sunny
38  Los Angeles          Sunny
39  Los Angeles          Sunny
40  Los Angeles          Sunny
41  Los Angeles          Sunny
42  Los Angeles          Sunny
43  Los Angeles          Sunny
44  Los Angeles          Sunny
45  Los Angeles          Sunny
46  Los Angeles          Sunny
47  Los Angeles          Sunny
48  Los Angeles          Sunny
49  Los Angeles          Sunny
50  Los Angeles          Sunny
51  Los Angeles          Sunny
52  Los Angeles          Sunny


I have created a function to output the values for each city, in my own line of work this would be fine as I could do a few checks on the data but for others they would need to be informed that for Los Angeles, top 3 could not be given as there is only one weather description. I have tried using IF statements with value counts but I keep getting error messages like ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). and do not think my method is correct, it is very difficult to find examples for these kind of problems. Any guidance or even links that could help would be appreciated!

def weather_valuecount(df):
  weather_valcount= df.groupby(['City']).Weather.value_counts().groupby(level=0, group_keys=False).head(3)

  return weather_valcount

When I run the above I get the following results:

City         Weather      
Austin       partly cloudy     5
             Sunny             4
             cloudy            4
Los Angeles  Sunny            18
New York     Rain              3
             Sunny             3
             cloudy            3
Name: Weather, dtype: int64

Which shows the top 3 description counts for each city, but Los Angeles only shows one, which I'd like to include a user message in the function to say something along the lines of "Cannot show top three unique Weather descriptions and count for Los Angeles as there is not 3 unique values available".

Marie
  • 11
  • 2
  • What do you mean by _top three weathers_ ? For example if the user asked for `Austin` ? What should he get as a result ? – Timeless Aug 21 '22 at 11:01
  • @L'Artiste I have added the results from my function above which shows the unique weather description with the 3 highest counts, but for Los Angeles which has only one unique factor for weather description, it only shows sunny and I would like to be able to add a user message to inform the user. – Marie Aug 21 '22 at 11:27
  • What do you mean by _inform_? Right now it looks like you are confusing the output of a function and communication with users. – Vitalizzare Aug 21 '22 at 11:35
  • @Vitalizzare That's possible I am not sure how to approach it and can't find good examples or articles maybe it should be seperate, but the outputs will be going into a text file, and I would like an output to inform the person reading that los Angeles only has one unique value hence cannot show the top 3 most common strings. – Marie Aug 21 '22 at 11:42
  • @Marie, check out my answer below and let me know if it corresponds to what you're expecting. – Timeless Aug 21 '22 at 11:46
  • There's another problem. You have more then top3 elements for Austin and NY. Try `nlargest(3, keep='all')` instead of `head(3)`. – Vitalizzare Aug 21 '22 at 12:01

2 Answers2

0

Check out this post who explains why you're getting Truth value of a Series is ambiguous

And regarding your question, I'm not sure I understand the expected output.

See the code/results below (with the consideration that df is the dataframe that holds your dataset) :

listOfCites = set(df['City'])

def show_top3_weather(df):
    df1 = df.groupby('City').head(3).reset_index(drop=True).assign()
    df2 = df1.drop_duplicates().groupby('City', as_index=False).count().rename(columns={'Weather':'WeatherOccu'})
    df3 = df1.merge(df2, on='City', how='left').drop_duplicates()

    city_name = input("Choose a city: ")
    
    if city_name in listOfCites:
        if (df3.loc[df3.City == city_name]['WeatherOccu'] == 3).any():
            print(f"Below, the top three weathers of {city_name}:")
            print(df3[df3.City == city_name][['City', 'Weather']])
        else:
            print(f"{city_name} has not three different weathers!")
    else:
        print(f"{city_name} doesn't exist!")

>>> show_top3_weather(df)

with New York as input

Choose a city:  New York
Below, the top three weathers of New York:
       City Weather
0  New York   Sunny
1  New York    Rain
2  New York  cloudy

with Austin as input

Choose a city:  Austin
Austin has not three different weathers!

with Los Angeles as input

Choose a city:  Los Angeles
Los Angeles has not three different weathers!
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • Thank you very much for the above and the link to the other post very informative, its almost what I am looking for! Only thing different is there is no user input for the cities so when ````show_top3_weather(df)```` is ran, it out puts all the results for Los Angeles, New York and Austin in one go. – Marie Aug 21 '22 at 12:17
0

We can print percentage to inform that there was always a sunny weather in Los Angeles. As an option, we could also add "other" to show the percentage of ignored weather types.

Taking into account that some items may appear with equal frequency, I suggest this code to try:

def get_nlargest(df, n, keep):
    "n, keep: see help('pandas.Series.nlargest')"
    top_n = df.value_counts(normalize=True).nlargest(n, keep)
    other = pd.Series({'other': 1 - top_n.sum()})
    return pd.concat([top_n, other])


def weather_nlargest(df, n=3, keep='all'):
    return (
        df
        .groupby(['City'])['Weather']
        .apply(get_nlargest, n, keep)
    )


def print_percentage(df):
    print(df.to_string(float_format='{:.0%}'.format))

    
df['Weather'] = df['Weather'].str.lower()   # sunny == Sunny, rain == Rain
print_percentage(weather_nlargest(df))

Output:

City                      
Austin       sunny            28%
             partly cloudy    28%
             cloudy           22%
             rain             22%
             other             0%
Los Angeles  sunny           100%
             other             0%
New York     sunny            35%
             rain             24%
             cloudy           18%
             partly cloudy    18%
             other             6%

Code to see no more then 3 weather types:

print_percentage(weather_nlargest(df, 3, 'first'))

Output:

City                      
Austin       sunny            28%
             partly cloudy    28%
             cloudy           22%
             other            22%
Los Angeles  sunny           100%
             other             0%
New York     sunny            35%
             rain             24%
             cloudy           18%
             other            24%
Vitalizzare
  • 4,496
  • 7
  • 13
  • 32
  • This is exactly the type of output I am looking for. Unfortunately on my end I am getting the error ```` File "", line 8 _:=x.value_counts(normalize=True).nlargest(n, keep), ^ SyntaxError: invalid syntax ```` Could you explain too what exactly this line is doing? Thank you so much really appreciate the help! – Marie Aug 21 '22 at 20:22
  • There's a [walrus operator](https://realpython.com/python-walrus-operator/) `:=`. You may be using Python 3.7 or lower, which might be the cause of this error.. What version of your Python? – Vitalizzare Aug 21 '22 at 20:27
  • I am using python version 3.6 – Marie Aug 21 '22 at 20:35
  • @Marie Try the updated code without `:=`. As for the question what that line was doing: I put in the middle a temporary variable with name `_`, which was assigned to the first series in `pd.concat`, and used it to calculate the second series with only `other:` inside. – Vitalizzare Aug 21 '22 at 20:48
  • 1
    Code works perfect, thank you very much for your help. I am going to study it and come back if I have any questions with how it works. Interesting to learn about the walrus operator too. – Marie Aug 21 '22 at 21:13