I'm going to work on a machine learning project with lots of data soon, so i tried to emulate the type of challenge I will face to prepare accordingly. The first one is plotting datetime (date on x axis and hour on y axis) from a .cvs like the following (with this type of data from April of 2021 and April 2020):
What I have now is the following plot from the 2021 set:
But I don't see why the dots aren't on the grid correctly, for instance, my latest point on April of 2021 should be the last date i have on the file (which is 07:37:56 30/04/2021):
But as you can see on the plot below:
The last dot on April is closer to 07:00:00 than 08:00:00 on the y ticks, and is in the middle of the x ticks, like the middle of the day 29 and 30. This is just an example of what's happening all over the plot, which I don't really know why is happening.
My code on Google Colab is like this
#LIBRARIES AND PRE PROCESSING
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as pltd
import numpy as np
from datetime import datetime
caminho_dados = "/content/sample.csv"
df = pd.read_csv(caminho_dados,encoding='UTF-8',sep=',')
novo=df['ID;Hora;Data;;;'].str.split(';').str
df['ID']=novo[0]
df['Hora']=novo[1]
df['Data']=novo[2]
df.drop(columns=['ID;Hora;Data;;;'],inplace=True)
df['Data']=pd.to_datetime(df['Data'],format ='%d/%m/%Y')
df['Hora']=pd.to_datetime(df['Hora'],format ='%H:%M:%S')
#SEPARATING THE TWO APRILS I HAVE ON TWO DATAFRAMES, 2021 ONE & 2020 ONE
df_vinteum = df.iloc[:50, :]
df_vinte = df.iloc[50:, :]
#PLOTTING
fig, ax = plt.subplots()
ax.scatter(df_vinteum['Data'], df_vinteum['Hora'])
ax.xaxis.set_major_formatter(pltd.DateFormatter('%d'))
ax.yaxis.set_major_formatter(pltd.DateFormatter('%H'))
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 1))
start, end = ax.get_ylim()
ax.yaxis.set_ticks(np.arange(start,end,(end-start)/24))
plt.title("Abril 2021")
plt.xlabel('Dias')
plt.ylabel('Horas')
plt.xticks(rotation=45)
plt.grid()
plt.rcParams["figure.figsize"] = (20,6)
plt.show()