1

I have a dataframe with financial data in it (Date, Open, Close, Low, High).

I want to graph the date versus one column (eg. Open). When I convert the "Date" column to a date, the graph does not display correctly. Why could this be happening?

import pandas as pd
import matplotlib.pyplot as plt

def plot_one_data(df, column, title= "No Title", color = "black"):

    df.plot(x="Date", y=column, title=title)
    #show plot
    filename= title+ ".png"
    plt.savefig(filename)


#Load in the csv files
df = pd.read_csv('data/df.csv')
print(df.head())
plot_one_data(df, "Open", "Before Converting to Date - good but no dates on x axis")


df['Date'] = pd.to_datetime(df["Date"])
print(df.head())
plot_one_data(df, "Open", "After Converting to Date - bad!")

Text Output:

       Date         Open     ...        Adj Close      Volume
0  14-08-06  1266.670044     ...      1268.209961  2118020000
1  15-08-06  1268.189941     ...      1285.579956  2334100000
2  16-08-06  1285.270020     ...      1295.430054  2554570000
3  17-08-06  1295.369995     ...      1297.479980  2458340000
4  18-08-06  1297.479980     ...      1302.300049  2033910000

[5 rows x 7 columns]
        Date         Open     ...        Adj Close      Volume
0 2006-08-14  1266.670044     ...      1268.209961  2118020000
1 2006-08-15  1268.189941     ...      1285.579956  2334100000
2 2006-08-16  1285.270020     ...      1295.430054  2554570000
3 2006-08-17  1295.369995     ...      1297.479980  2458340000
4 2006-08-18  1297.479980     ...      1302.300049  2033910000

Before

After

Mel
  • 6,214
  • 10
  • 54
  • 71

1 Answers1

2

Solution - add format

Updated line

df['Date'] = pd.to_datetime(df["Date"], format='%d-%m-%y')

I found that looking at the first 30 entries (before and after changing to datetime) helped. It works until the the end of August 2006 then jumps to the 9th of January!

First 30 entries in original data

        Date         Open     ...        Adj Close      Volume
0   14-08-06  1266.670044     ...      1268.209961  2118020000
1   15-08-06  1268.189941     ...      1285.579956  2334100000
2   16-08-06  1285.270020     ...      1295.430054  2554570000
3   17-08-06  1295.369995     ...      1297.479980  2458340000
4   18-08-06  1297.479980     ...      1302.300049  2033910000
5   21-08-06  1302.300049     ...      1297.520020  1759240000
6   22-08-06  1297.520020     ...      1298.819946  1908740000
7   23-08-06  1298.729980     ...      1292.989990  1893670000
8   24-08-06  1292.969971     ...      1296.060059  1930320000
9   25-08-06  1295.920044     ...      1295.089966  1667580000
10  28-08-06  1295.089966     ...      1301.780029  1834920000
11  29-08-06  1301.569946     ...      1304.280029  2093720000
12  30-08-06  1303.699951     ...      1305.369995  2060690000
13  31-08-06  1304.250000     ...      1303.819946  1974540000 #Smooth change from August to September
14  01-09-06  1303.800049     ...      1311.010010  1800520000
15  05-09-06  1310.939941     ...      1313.250000  2114480000
16  06-09-06  1313.040039     ...      1300.260010  2329870000
17  07-09-06  1300.209961     ...      1294.020020  2325850000
18  08-09-06  1294.020020     ...      1298.920044  2132890000
19  11-09-06  1298.859985     ...      1299.540039  2506430000
20  12-09-06  1299.530029     ...      1313.000000  2791580000
21  13-09-06  1312.739990     ...      1318.069946  2597220000
22  14-09-06  1318.000000     ...      1316.280029  2351220000
23  15-09-06  1316.280029     ...      1319.660034  3198030000
24  18-09-06  1319.849976     ...      1321.180054  2325080000
25  19-09-06  1321.170044     ...      1317.640015  2390850000
26  20-09-06  1318.280029     ...      1325.180054  2543070000
27  21-09-06  1324.890015     ...      1318.030029  2627440000
28  22-09-06  1318.030029     ...      1314.780029  2162880000
29  25-09-06  1314.780029     ...      1326.369995  2710240000

First 30 entries after changing to datetime

[30 rows x 7 columns]
         Date         Open     ...        Adj Close      Volume
0  2006-08-14  1266.670044     ...      1268.209961  2118020000
1  2006-08-15  1268.189941     ...      1285.579956  2334100000
2  2006-08-16  1285.270020     ...      1295.430054  2554570000
3  2006-08-17  1295.369995     ...      1297.479980  2458340000
4  2006-08-18  1297.479980     ...      1302.300049  2033910000
5  2006-08-21  1302.300049     ...      1297.520020  1759240000
6  2006-08-22  1297.520020     ...      1298.819946  1908740000
7  2006-08-23  1298.729980     ...      1292.989990  1893670000
8  2006-08-24  1292.969971     ...      1296.060059  1930320000
9  2006-08-25  1295.920044     ...      1295.089966  1667580000
10 2006-08-28  1295.089966     ...      1301.780029  1834920000
11 2006-08-29  1301.569946     ...      1304.280029  2093720000
12 2006-08-30  1303.699951     ...      1305.369995  2060690000
13 2006-08-31  1304.250000     ...      1303.819946  1974540000 #Fine until here
14 2006-01-09  1303.800049     ...      1311.010010  1800520000 #Problem here
15 2006-05-09  1310.939941     ...      1313.250000  2114480000
16 2006-06-09  1313.040039     ...      1300.260010  2329870000
17 2006-07-09  1300.209961     ...      1294.020020  2325850000
18 2006-08-09  1294.020020     ...      1298.920044  2132890000
19 2006-11-09  1298.859985     ...      1299.540039  2506430000
20 2006-12-09  1299.530029     ...      1313.000000  2791580000
21 2006-09-13  1312.739990     ...      1318.069946  2597220000
22 2006-09-14  1318.000000     ...      1316.280029  2351220000
23 2006-09-15  1316.280029     ...      1319.660034  3198030000
24 2006-09-18  1319.849976     ...      1321.180054  2325080000
25 2006-09-19  1321.170044     ...      1317.640015  2390850000
26 2006-09-20  1318.280029     ...      1325.180054  2543070000
27 2006-09-21  1324.890015     ...      1318.030029  2627440000
28 2006-09-22  1318.030029     ...      1314.780029  2162880000
29 2006-09-25  1314.780029     ...      1326.369995  2710240000

Here is the fixed image: enter image description here

Mel
  • 6,214
  • 10
  • 54
  • 71
  • basically a duplicate of [pandas.to_datetime inconsistent time string format](https://stackoverflow.com/q/15929861/10197418) - pandas to_datetime assumes month first by default if you don't specify a format or set keyword `dayfirst=True`. – FObersteiner Jan 09 '23 at 07:46