0

I have the following dataset that I have obtained after manipulating my main dataset:

date          mean                  std                var
2011-01 1231.9032258064517  372.43266548811295  138706.09032258063
2011-02 1721.9642857142858  398.5088392665148   158809.29497354495
2011-03 2065.967741935484   550.9717163866624   303569.83225806465
2011-04 3162.3333333333335  1042.0935934316383  1085959.0574712649
2011-05 4381.322580645161   572.9278830182602   328246.3591397852
2011-06 4783.733333333334   444.4478118646375   197533.85747126423
2011-07 4559.387096774193   680.0907624437274   462523.44516129047
2011-08 4409.387096774193   809.8524012608451   655860.9118279569
2011-09 4247.266666666666   965.3772510172429   931953.2367816088
2011-10 3984.2258064516127  1103.4818442752744  1217672.180645161
2011-11 3405.5666666666666  788.2492903125009   621336.9436781612
2011-12 2816.8709677419356  958.7631873733815   919226.8494623657
2012-01 3120.7741935483873  872.852133704116    761870.8473118279
2012-02 3556.448275862069   870.7246401789595   758161.3990147784
2012-03 5318.548387096775   1251.1626816341582  1565408.055913978
2012-04 5807.466666666666   1308.938915709007   1713321.0850574712
2012-05 6318.225806451613   1078.4039969534429  1162955.180645161
2012-06 6761.0  954.2049949637621   910507.1724137934
2012-07 6567.967741935484   867.1837361586443   752007.6322580652
2012-08 6919.451612903225   794.0590170639995   630529.722580645
2012-09 7285.766666666666   979.1609245123879   958756.1160919542
2012-10 6414.225806451613   1941.8193995954312  3770662.5806451603
2012-11 5088.8  1129.7311978122693  1276292.5793103448
2012-12 3990.7419354838707  1803.227864464942   3251630.7311827955

To do:

I have to plot the distribution of average daily number of shared bikes against month/year (x-axis is the month/year)

My question is:

When I do

sns.distplot(avg_var_2['mean'], hist_kws=dict(edgecolor="k", linewidth=2))

it gives the output as:

enter image description here

I get mean on the x-axis. I would like to know how can I modify the above code so that I can get the month/year on the x-axis and achieve the task?

Either matplotlib or seaborn is fine for me.

Edit 1:

My main dataset:

  date      total_rides
    1/1/2011    985
    1/2/2011    801
    1/3/2011    1349
    1/4/2011    1562
    1/5/2011    1600
    1/6/2011    1606
    1/7/2011    1510
    1/8/2011    959
    1/9/2011    822
    1/10/2011   1321
    1/11/2011   1263
    1/12/2011   1162
    1/13/2011   1406
    1/14/2011   1421
    1/15/2011   1248
    1/16/2011   1204
    1/17/2011   1000
    1/18/2011   683
    1/19/2011   1650
    1/20/2011   1927
    1/21/2011   1543
    1/22/2011   981
    1/23/2011   986
    1/24/2011   1416
    1/25/2011   1985
    1/26/2011   506
    1/27/2011   431
    1/28/2011   1167
    1/29/2011   1098
    1/30/2011   1096
    1/31/2011   1501
    2/1/2011    1360
    2/2/2011    1526
    2/3/2011    1550
    2/4/2011    1708
    2/5/2011    1005
    2/6/2011    1623
    2/7/2011    1712
    2/8/2011    1530
    2/9/2011    1605
    2/10/2011   1538
    2/11/2011   1746
    2/12/2011   1472
    2/13/2011   1589
    2/14/2011   1913
    2/15/2011   1815
    2/16/2011   2115
    2/17/2011   2475
    2/18/2011   2927
    2/19/2011   1635
    2/20/2011   1812
    2/21/2011   1107
    2/22/2011   1450
    2/23/2011   1917
    2/24/2011   1807
    2/25/2011   1461
    2/26/2011   1969
    2/27/2011   2402
    2/28/2011   1446

It has data upto 12/31/2012 date. I couldn't upload the entire data since it is 731 rows long

zeeman
  • 117
  • 7
  • You want to change name of the x-label? If so you can do so with `plt.xlabel("date")` – philoez98 Jun 21 '19 at 20:19
  • @philoez98 I don't want to change the name of the x-label. I want the month/year column to be on the x-axis and the distribution of the `mean` column must be plotted – zeeman Jun 21 '19 at 20:20
  • @zeeman it looks like your data has already been summarized. You have only one item for each month. Using this data you can plot a line graph, but not a distribution. Do you have the individual records for each bike usage? – SNygard Jun 21 '19 at 20:26
  • @SNygard Hey, I have added my original dataset in the question. Was it this that you were asking? – zeeman Jun 21 '19 at 20:30
  • Maybe this might be helpful: https://stackoverflow.com/questions/9103166/multiple-axis-in-matplotlib-with-different-scales?rq=1 – philoez98 Jun 21 '19 at 20:31
  • A distribution shows the probability density as a function of the data values. This means the x axis naturally has the units of the data values, not dates. If you were to look for a distribution of the dates you would find that each date just occurs equally oftern (no supprise); but in turn it means the question makes no sense at all. – ImportanceOfBeingErnest Jun 21 '19 at 21:43
  • If you want to plot the mean for every day, try use seaborn: 1. convert your data column to a [datatime columns](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) 2. try `sns.lineplot(x='date', y='mean', data=avg_var_2)` – steven Jun 22 '19 at 15:14

0 Answers0