How to fill missing dates with corresponding NaN in other columns

Question

I have a CSV that initially creates following dataframe:

    Date        Portfoliovalue
0   2021-05-01  50000.0
1   2021-05-05  52304.0

Using the following script, I would like to fill the missing dates and have a corresponding NaN value in the Portfoliovalue column with NaN. So the result would be this:

    Date        Portfoliovalue
0   2021-05-01  50000.0
1   2021-05-02  NaN
2   2021-05-03  NaN
3   2021-05-04  NaN
4   2021-05-05  52304.0

I first tried the method here: Fill the missing date values in a Pandas Dataframe column

However the bfill replaces all my NaN's and removing it only returns an error.

So far I have tried this:

df = pd.read_csv("Tickers_test5.csv")
df2 = pd.read_csv("Portfoliovalues.csv")

portfolio_value = df['Currentvalue'].sum()
portfolio_value = portfolio_value + cash

date = datetime.date(datetime.now())

df2.loc[len(df2)] = [date, portfolio_value]

print(df2.asfreq('D'))

However, this only returns this:

            Date    Portfoliovalue
1970-01-01  NaN     NaN

Thanks for your help. I am really impressed at how helpful this community is.

Quick update:

I have added the code, so that it fills my missing dates. However, it is part of a programme, which tries to update the missing dates every time it launches. So when I execute the code and no dates are missing, I get the following error:

ValueError: cannot reindex from a duplicate axis”

The code is as follows:

df2 = pd.read_csv("Portfoliovalues.csv")


portfolio_value = df['Currentvalue'].sum()

date = datetime.date(datetime.now())


df2.loc[date, 'Portfoliovalue'] = portfolio_value

#Solution provided by Uts after asking on Stackoverflow
df2.Date = pd.to_datetime(df2.Date)
df2 = df2.set_index('Date').asfreq('D').reset_index()

So by the looks of it the code adds a duplicate date, which then causes the .reindex() function to raise the ValueError. However, I am not sure how to proceed. Is there an alternative to .reindex() or maybe the assignment of today's date needs changing?

your source date starts at 1 and ends with 10, however, your expected output ends on day 5 — sammywemmy, May 22 '21 at 10:25
Check this post it did help me https://stackoverflow.com/a/54517808/14566331 — Azarkuz, May 22 '21 at 10:36

score 3 · Accepted Answer · answered May 22 '21 at 10:52

3

Pandas has asfreq function for datetimeIndex, this is basically just a thin, but convenient wrapper around reindex() which generates a date_range and calls reindex.

Code

df.Date = pd.to_datetime(df.Date)
df = df.set_index('Date').asfreq('D').reset_index()

Output

    Date        Portfoliovalue
0   2021-05-01  50000.0
1   2021-05-02  NaN
2   2021-05-03  NaN
3   2021-05-04  NaN
4   2021-05-05  52304.0

answered May 22 '21 at 10:52

Utsav

5,572
2
29
43

1

Thank you, I will tick it once my reputation reaches 15 points. – Charlie_Sierra May 22 '21 at 20:59
I have just updated the original question, as there is a little follow-up problem. Has anybody got an idea how I could proceed? – Charlie_Sierra May 25 '21 at 14:04
Can you open a separate question with the issue, please provide sample input, sample output and error stack trace and your code in the question. U can link this question to the new question if you want. – Utsav May 25 '21 at 15:14

score 1 · Answer 2 · answered May 22 '21 at 10:44

1

Pandas has reindex method: given a list of indices, it remains only indices from list.

In your case, you can create all the dates you want, by date_range for example, and then give it to reindex. you might needed a simple set_index and reset_index, but I assume you don't care much about the original index.

Example:

df.set_index('Date').reindex(pd.date_range(start=df['Date'].min(), end=df['Date'].max(), freq='D')).reset_index()

On first we set 'Date' column as index. Then we use reindex, it full list of dates (given by date_range from minimal date to maximal date in 'Date' column, with daily frequency) as new index. It result nans in places without former value.

answered May 22 '21 at 10:44

Roim

2,986
2
10
25

For this solution I get the following Error: TypeError: '<=' not supported between instances of 'str' and 'datetime.date' – Charlie_Sierra May 22 '21 at 13:26
well I guess I (falsely assumed) you work with type pd.Timedelta, while the type of Date is actually string in your case. You should parse it first to be different type: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html – Roim May 22 '21 at 13:40
Thank you, I will look into it to convert it into a datetime format. However, I thought by using date = datetime.date(datetime.now()) I would have such a format? – Charlie_Sierra May 22 '21 at 21:00
it is another possibility, but I'm not sure you actually change the date that way: check carefully if indeed the type has changed – Roim May 23 '21 at 10:46

How to fill missing dates with corresponding NaN in other columns

2 Answers2