0

I have a dataframe that records number of observations at different locations for different years. I am trying to make a barplot where I can show the total number of observations at different locations for different years. For each location, I want the total observations, for different years to be shown in different colors. My approach is to first make location groups and for each location group, calculate total observation. (I don't think I need to change the index to date - as I am grouping by location).I am not able to achieve this using the following code. Help will be much appreciated.

fig, ax = plt.subplots(figsize=(40,15))
date=df['date']
value=df['value']
df.date = pd.to_datetime(df.date)


year_start=2015
year_stop = 2019
#ax=plt.gca()

for year in range(year_start, year_stop+1):
    ax=plt.gca()
    m=df.groupby(['location']).agg({'value': ['count']})


    plt.ylim(0,45000)
    m.plot(kind='bar', legend = False, figsize=(30,15), fontsize = 30)
    #ax.tick_params(axis='both', which='major', labelsize=25)
    plt.ylabel('Number of observations - O3', fontsize = 30, fontweight = 'bold')    

    plt.legend(loc='upper right', prop={'size': 7})
    fig_title='Diurnal_'+place
    plt.savefig(fig_title, format='png',dpi=500, bbox_inches="tight")

    print ('saved=', fig_title)
    plt.show()


The header looks like this:
                             date_utc                       date parameter  \
    212580  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   
    212581  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   
    212582  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   
    212583  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   
    212584  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   

                                               location  value   unit       city  \
    212580        ICRISAT Patancheru, Mumbai - TSPCB   37.7  µg/m³  Hyderabad   
    212581  Bollaram Industrial Area, Surat - TSPCB   39.5  µg/m³  Hyderabad   
    212582          IDA Pashamylaram, Surat - TSPCB   17.8  µg/m³  Hyderabad   
    212583               Sanathnagar, Hyderabad - TSPCB   56.6  µg/m³  Hyderabad   
    212584                  Zoo Park, Hyderabad - TSPCB   24.5  µg/m³  Hyderabad   

1 Answers1

0

Since I was not able to fully reproduce your example, I implemented a toy example from what I understood. Please tell me if I understood something wrong. Here is my code:

import seaborn as sns
import numpy as np
import pandas as pd


df = pd.DataFrame([['Mumbai',2017,10],['Mumbai',2017,12],['Mumbai',2018,20],['Mumbai',2018,23],['Abu Dhabi',2017,30],['Abu Dhabi', 2018,25]], columns =['Place','Year','Amount'])

df_grouped = df.groupby(['Place','Year']).agg({'Amount':'count'}).reset_index()

sns.barplot(x='Place',y='Amount',hue='Year',data= df_grouped)

This code will show a barplot, where each location will reside in x-axis and their total counts in y-axis. Moreover, each unique year will get its own bar in the barplot. Like this:

enter image description here

Koralp Catalsakal
  • 1,114
  • 8
  • 11
  • Thank you, Koralp. You understood it right mostly....only that my timestamps cannot read only year; the format is different. How do I include time information in this case? I am doing this: df_grouped=df.groupby(['location','Year']).agg({'value':'count'}).reset_index() sns.barplot(x='location',y='value',hue='Year',data= df_grouped) . This shows this error: 'location' is both an index level and a column label, which is ambiguous. – Learning_datascience Jan 22 '20 at 22:10
  • Well, if you use this code, then the plot will print many columns per place as the amount of different timestamp values you have for it. The value of year being just an integer is not important for the `hue` value here. So I would assume the formatting information would be handled by seaborn – Koralp Catalsakal Jan 22 '20 at 22:16
  • How to address the error on - 'location' is both an index level and a column label, which is ambiguous. – Learning_datascience Jan 22 '20 at 22:18
  • Well, It would give the error if you have the `location` keyword in both a column and the index. I can suggest to remove the keyword from the index level. Although, the code I posted did not seem to have that error – Koralp Catalsakal Jan 22 '20 at 22:21
  • df['Year'] = df['date'].map(lambda x: x.year ).....the code needed this too. Thanks it worked finally! – Learning_datascience Jan 23 '20 at 21:45
  • https://stackoverflow.com/questions/60047258/pandas-grouping-and-resampling-for-a-bar-plot?noredirect=1#comment106198173_60047258 @Koralp, can you help with this question too? – Learning_datascience Feb 03 '20 at 22:12