0

this is the first time i ask something here, so sorry if im doing anything wrong. I have this data in a panda dataFrame:

    Year    Month   PassengerCountSum   Date    DateOrd Prediction
0   2006    9   2720100.000 2006-09-01  732555  2815063.471
1   2007    5   3056934.000 2007-05-01  732797  2908360.055
2   2012    2   2998119.000 2012-02-01  734534  3578013.633
3   2008    4   3029021.000 2008-04-01  733133  3037895.807
4   2006    10  2834959.000 2006-10-01  732585  2826629.163
... ... ... ... ... ... ...
124 2007    7   3382382.000 2007-07-01  732858  2931876.962
125 2009    6   3419595.000 2009-06-01  733559  3202128.637
126 2012    9   3819379.000 2012-09-01  734747  3660130.047
127 2013    10  3910790.000 2013-10-01  735142  3812411.661
128 2011    6   3766323.000 2011-06-01  734289  3483560.480

I need to make a graph with the Date in the X axis and PassengerCountSum in the Y axis. Also i need to show the values of the Prediction in a linear regresion.

there is no problem when i do this:

plt.plot(df_pass_by_year_pd['Date'] , df_pass_by_year_pd['Prediction'])

It paints a perfect linear regression.

But when I change the df_pass_by_year_pd['Prediction']) for df_pass_by_year_pd['PassengerCountSum']) to show the real values of the dataFrame like this :

plt.plot(df_pass_by_year_pd['Date'] , df_pass_by_year_pd['PassengerCountSum'])

The graph goes crazy and paint things I dont really understand.

plot

Someone sees the problem? Ty all!

I have tried to change type of the column and reshape the array but im pretty new to all of this so any help or tip is welcome

  • The points are connected in the order they are encountered in the dataframe. As all the points of `predicition` lie on one line, you don't see the wiggling. You can sort the dateframe on `Date` to obtain a plot going left to right. – JohanC Dec 28 '22 at 14:03
  • That was the problem, as @chrslg pointed too. Sorted the DataFrame and everything looks fine now. Than you very much. – BigGiorgio Dec 28 '22 at 14:30

1 Answers1

1

Your data are not sorted. So it draws line between each pair of subsequents (x,y), (x',y').

You had the same problem with prediction also. You believe you see one straigth line, but in reality what you saw is a myriad of straight lines superposed. But since your prediction are perfectly aligned, you didn't saw it.

Mitigation: sort your data by date before ploting.

chrslg
  • 9,023
  • 5
  • 17
  • 31
  • Oh my, i feel pretty ashamed now, it looks so obvious now hahaha. thank you. That was the problem. Sorted the DataFrame and everything looks fine now – BigGiorgio Dec 28 '22 at 14:29