1

First of all I am sorry if I am not describing the problem correctly but the example should make my issue clear.

I have this dataframe and I need to plot it sorted by date, but I have lots of date (around 60), therefore pandas automatically chooses which date to plot(label) in x-axis and the dates are random. Due to visibility issue I too want to only plot selected dates in x-axis but I want it to have some pattern like january every year.

This is my code:

df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
df1 = df[df['Resource_ID'] == 32543]
df1 = df1[['Format','Range_Start','Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format','Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
df2.to_csv('test.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
    df2.T[['entry','sum']].plot(rot = 30)
else:
    df2.T[['sum']].plot(kind = 'bar')
ax1 = plt.axes()
ax1.legend(["Seitenzugriffe", "Dateiabrufe"])
plt.xlabel("")
plt.savefig('image.png')

This is the plot

As you can see the plot has 2010-08, 2013-09, 2014-07 as the x-axis value. How can I make it something like 2010-01, 2013-01, 2014-01 e.t.c

Thank you very much, I know this is not the optimal description but since english is not my first language this is the best I could come up with.

tdube
  • 2,453
  • 2
  • 16
  • 25
MessitÖzil
  • 1,298
  • 4
  • 13
  • 23
  • In order to have full control over the datetime formatting on the xaxis, you would need to use matplotlib dates locators and formatters. Since they cannot be used on axes created via pandas, using matplotlib from the beginning will be the best option. See e.g. [this question](https://stackoverflow.com/questions/44213781/pandas-dataframe-line-plot-display-date-on-xaxis) – ImportanceOfBeingErnest Oct 30 '17 at 15:51

1 Answers1

1

NOTE: Updated to answer OP question more directly.

You are mixing Pandas plotting as well as the matplotlib PyPlot API and Object-oriented API by using axes (ax1 above) methods and plt methods. The latter are two distinctly different APIs and they may not work correctly when mixed. The matplotlib documentation recommends using the object-oriented API.

While it is easy to quickly generate plots with the matplotlib.pyplot module, we recommend using the object-oriented approach for more control and customization of your plots. See the methods in the matplotlib.axes.Axes() class for many of the same plotting functions. For examples of the OO approach to Matplotlib, see the API Examples.

Here's how you can control the x-axis "tick" values/labels using proper matplotlib date formatting (see matplotlib example) with the object-oriented API. Also, see link from @ImportanceOfBeingErnest answer to another question for incompatibilities between Pandas' and matplotlib's datetime objects.

# prepare your data
df = pd.read_csv('../../../so/dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
df.head()
df1 = df[df['Resource_ID'] == 10021]
df1 = df1[['Format','Range_Start','Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format','Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
df2.to_csv('test.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
    # convert your index to use pandas datetime format
    df3 = df2.T[['entry','sum']].copy()
    df3.index = pd.to_datetime(df3.index)
    # for illustration, I changed a couple dates and added some dummy values
    df3.loc['2014-01-01']['entry'] = 48
    df3.loc['2014-05-01']['entry'] = 28
    df3.loc['2015-05-01']['entry'] = 36
    print(df3)

    # plot your data
    fig, ax = plt.subplots()

    # use matplotlib date formatters
    years = mdates.YearLocator()   # every year
    yearsFmt = mdates.DateFormatter('%Y-%m')

    # format the major ticks
    ax.xaxis.set_major_locator(years)
    ax.xaxis.set_major_formatter(yearsFmt)

    ax.plot(df3)

    # add legend
    ax.legend(["Seitenzugriffe", "Dateiabrufe"])

    fig.savefig('image.png')
else:
    # left as an exercise...
    df2.T[['sum']].plot(kind = 'bar')
tdube
  • 2,453
  • 2
  • 16
  • 25
  • axes.plot(df2.T[['entry','sum']]) throws following error: ValueError: could not convert string to float: '2017-06' – MessitÖzil Oct 30 '17 at 16:14
  • @Uttam Can you provide some sample data to test? – tdube Oct 30 '17 at 16:50
  • Access_Stat_ID,Resource_ID,Range_Start,Range_End,Name,Format,Number,Matched_URL 6890859,10020,"2014-05-01 00:00:00","2014-05-31 23:59:59","May 2014","html",89,"/dissertationen/biologie/behrend-anke/HTML/behrend-vita.html" 6890860,10021,"2014-05-01 00:00:00","2014-05-31 23:59:59","May 2014","pdf",30,"/dissertationen/biologie/dreier-lars/PDF/Dreier.pdf" 6890861,10021,"2014-05-01 00:00:00","2014-05-31 23:59:59","May 2014","entry",2,"?" 6890862,10021,"2014-05-01 00:00:00","2014-05-31 23:59:59","May 2014","html",11,"/dissertationen/biologie/dreier-lars/HTML/chapter4.html" – MessitÖzil Oct 30 '17 at 22:26