0

I have a csv file with aprox. 3000 lines and the following format:

Timestamp,PV Generation (W)
2019-01-01 00:00:00,616.54
2019-01-01 00:15:00,617.75
2019-01-01 00:30:00,752.56

I am using pandas to read it using:

columnnames = ['Timestamp','PV Generation (W)'] 
df = pandas.read_csv(filename, sep=',', engine='python', names=columnnames,index_col=None)  
timestamp = df['Timestamp']
pvgeneration = df['PV Generation (W)']
pvgenerationlist=pvgeneration.tolist() #create a list with pvgeneration column 
timestamplist=timestamp.tolist() #create a list with timestamp column

because the first row of each column is the header, i remove the first element of each list with:

timestamplist.pop(0)  
pvgenerationlist.pop(0) 

The pvgeneration values are a string, so i change them to float:

pvgenfloat = []
for item in pvgenerationlist:
    pvgenfloat.append(float(item))

I now have two lists left that i wish to plot using matplotlib. I want the Y axis to have the float PV generation values, and the X axis to have some of the timestamp string values.

plt.plot(timestamplist,pvgenfloat)

gives a mess because it plots 3000 timestamps in the X axis.

By hovering using the mouse on the chart i can read the values under it: x=2019-01-01 00:00:00 y=616.54

So i attempt to use plt.xticks to plot only the first and last string element of my timestamplist using:

xlisttoplot=[''] * len(timestamplist)  #i create a new list to plot with same length but blank elements
xlisttoplot[0]=timestamplist[0] #i keep the first element of the timestamp list which is 2019-01-01 00:00:00 
xlisttoplot[-1]=timestamplist[-1] #and i also keep the last one
plt.xticks(timestamplist,xlisttoplot,rotation=45,size = 8)

The above works, it plots only the first and last value in X, but now i cant use the mouse to hover on the values. By hovering it reads: x= y=616.54

I need to read the data by hovering. Any clue on how to solve this, or do the same thing in a different way? Thank you

S.Filip
  • 3
  • 2
  • If your question is solved, say thank you by checking as accepted and/or pushing the up arrow. If a better one shows up you can always change your selection. The check is below the up/down arrow at the top left of the answer. Leave a comment if it doesn't answer the question. https://stackoverflow.com/help/someone-answers – RichieV Sep 11 '20 at 19:00

1 Answers1

0

You can plot directly from pandas dataframe.

from matplotlib import pyplot as plt

df = pandas.read_csv(filename, sep=',', engine='python', index_col=None)

# pandas reads the first line as column names by default
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['PV Generation (W)'] = df['PV Generation (W)'].astype(float)

# set `Timestamp` as index so it will become the x-axis
df.set_index('Timestamp', inplace=True)

df.plot()
plt.show()

Output

enter image description here

Pandas will reduce the amount of x-labels passed to matplotlib. (Honestly I'm not sure if it is handled by pandas or by a matplotlib date parser). Anyways the result will have less dates in the x axis, to make them readable.

Otherwise, you can control the xticks as in this question.

RichieV
  • 5,103
  • 2
  • 11
  • 24
  • This code almost works. Instead of having the timestamp on the X axis, it plots a yearly division on the Y axis (1990,2000,2010,2020), and then it plots the timestamp as a straight line at the height were the date is 2019. Im not familiar with pandas but i assume that pd.to_datetime only reads the year. Any clue or link on how to fix this and plot the date on the X axis? Thank you sir. – S.Filip Sep 07 '20 at 15:27
  • @S.Filip it was because `df` had a default RangeIndex, try it now – RichieV Sep 07 '20 at 17:10
  • Thank you so much for your help! – S.Filip Sep 11 '20 at 18:42