-1

I currently have a method that uses Pandas to filter data from a massive .csv, and then uses matplotlib to graph a scatter plot of the filtered data, and then a line chart on top of that using the means of the data grouped by week. So there's a layer that has raw data, and then a processed line on top of that.

To accomplish that, I have to convert one of the columns from a string to DateTime using .to_datetime(). This, however, makes it impossible to run a regression on it. I can' t find a way to do this easily using Pandas, so is there a way to maybe convert the DateTime column to an int or float, run the regression/make a trend line, and then overlay it onto my graph?

I'm not really sure which parts of my code would be useful here but if there is a section that would help solve this I'd be happy to include it!

  • 1
    `This, however, makes it impossible to run a regression on it` Why is this the case? what have you tried that's not working? – MattR Jan 21 '20 at 21:12
  • 1
    Probably a duplicate of https://stackoverflow.com/questions/20576618/pandas-datetime-column-to-ordinal – Jody Klymak Jan 21 '20 at 21:13

1 Answers1

2

Would

import matplotlib.dates as mdates

mdates.date2num(time_var)

work? It converts the datetime to a float in fractional days since 0001-01-01 00:00:00 UTC

lsterzinger
  • 687
  • 5
  • 22
  • I just tried this but it looks as though matplotlib reads the Pandas DateTime column as a string. I may be doing what you're suggesting incorrectly though. – Cameron McMains Jan 22 '20 at 01:11
  • 1
    Are you doing it with the raw column data, or the data after you pass it through `pd.to_datetime()`? If it's the former, try `mdates.date2num(pd.to_datetime(time_var))` – lsterzinger Jan 22 '20 at 01:15
  • I appreciate the help! I figured it out. I was mistakenly passing the original imported data frame through what you suggested, so it hadn't been converted to a date in the first place. Once I figured that out, your suggestion did the trick! – Cameron McMains Jan 22 '20 at 01:35