Elegant way to create empty pandas DataFrame with NaN of type float

Question

I want to create a Pandas DataFrame filled with NaNs. During my research I found an answer:

import pandas as pd

df = pd.DataFrame(index=range(0,4),columns=['A'])

This code results in a DataFrame filled with NaNs of type "object". So they cannot be used later on for example with the interpolate() method. Therefore, I created the DataFrame with this complicated code (inspired by this answer):

import pandas as pd
import numpy as np

dummyarray = np.empty((4,1))
dummyarray[:] = np.nan

df = pd.DataFrame(dummyarray)

This results in a DataFrame filled with NaN of type "float", so it can be used later on with interpolate(). Is there a more elegant way to create the same result?

I had to put dummyarray = np.empty((4,1)) for me to work – Goofball Aug 21 '16 at 19:29 — Goofball, Aug 21 '16 at 19:29

ojdo · Accepted Answer · 2021-08-27T08:11:16.903

143

Simply pass the desired value as first argument, like 0, math.inf or, here, np.nan. The constructor then initializes and fills the value array to the size specified by arguments index and columns:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])

>>> df
    A   B
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN

>>> df.dtypes
A    float64
B    float64
dtype: object

edited Aug 27 '21 at 08:11

answered May 05 '15 at 12:51

ojdo

8,280
5
37
60

5

Works for pd.Series too. Excellent! – Bill Sep 19 '16 at 16:54
Replace `np.nan` by `np.full(shape=(4,2),fill_value=np.nan)` save running time! Test for `10000` rows and `10` for example. – Say OL May 05 '23 at 08:48
@SayOL: I cannot confirm. `%timeit df = pd.DataFrame(np.full(shape=(100_000,26), fill_value=np.nan), index=range(100_000), columns=range(26))` only slightly faster (1.63 ms/loop) compared to its simpler version (1.67 ms) on my machine. Which speedup do you observe with which exact setup? – ojdo Jul 28 '23 at 08:43

score 14 · Answer 2 · answered May 05 '15 at 12:48

You could specify the dtype directly when constructing the DataFrame:

>>> df = pd.DataFrame(index=range(0,4),columns=['A'], dtype='float')
>>> df.dtypes
A    float64
dtype: object

Specifying the dtype forces Pandas to try creating the DataFrame with that type, rather than trying to infer it.

score 7 · Answer 3 · answered Jun 20 '18 at 10:24

7

Hope this can help!

 pd.DataFrame(np.nan, index = np.arange(<num_rows>), columns = ['A'])

answered Jun 20 '18 at 10:24

errorParser

471
5
5

score 6 · Answer 4 · answered Jan 28 '19 at 06:40

You can try this line of code:

pdDataFrame = pd.DataFrame([np.nan] * 7)

This will create a pandas dataframe of size 7 with NaN of type float:

if you print pdDataFrame the output will be:

     0
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN

Also the output for pdDataFrame.dtypes is:

0    float64
dtype: object

score 5 · Answer 5 · answered Aug 26 '19 at 14:02

5

For multiple columns you can do:

df = pd.DataFrame(np.zeros([nrow, ncol])*np.nan)

answered Aug 26 '19 at 14:02

Digio

151
1
4

score 0 · Answer 6 · answered Aug 02 '23 at 19:20

0

You can also just pass an empty dictionary with repetition.

df= pd.DataFrame([{}]*4, columns=['A'])

answered Aug 02 '23 at 19:20

Radagast

5,102
3
12
27

Elegant way to create empty pandas DataFrame with NaN of type float

6 Answers6

Linked