I have two dataframes
that are contain market daily end of day data. They are supposed to contain identical starting dates and ending dates and number of rows, but when I print the len
of each, one is bigger by one than the other:
DF1
close
date
2008-01-01 45.92
2008-01-02 45.16
2008-01-03 45.33
2008-01-04 42.09
2008-01-07 46.98
...
[2870 rows x 1 columns]
DF2
close
date
2008-01-01 60.48
2008-01-02 59.71
2008-01-03 58.43
2008-01-04 56.64
2008-01-07 56.98
...
[2871 rows x 1 columns]
How can I show which row either:
- has a duplicate row,
- or has an extra date
so that I can delete the [probable] weekend/holiday date row that is in DF2
but not in DF1
?
I have tried things like:
df1 = df1.drop_duplicates(subset='date', keep='first')
df2 = df1.drop_duplicates(subset='date', keep='first')
but can't get it to work [ValueError: not enough values to unpack (expected 2, got 0)
].
Extra:
How do I remove weekend dates from a dataframe?