How can I fill in a missing values in range with Pandas?

Question

I have a dataset with a number of values like below.

>>> a.head()
   value  freq
3      9     1
2     11     1
0     12     4
1     15     2

I need to fill in the values between the integers in the value column. For example, I need to insert one new row between 9 & 11 filled with zeroes, then another two between 12-15. The end result should be the dataset with 9-15 with 'missing' rows as zeroes across the board.

Is there anyway to insert a new row at an specific location without replacing data? The only methods I've found involve slicing the dataframe at a location then appending a new row and concatenating the remainder.

UPDATE: The index is completely irrelevant so don't worry about that.

Are your Series objects so large that adding the values manually is out of the question? — kennes, Feb 08 '16 at 19:33
Probably, the sample set I have right now is small, but the actual dataset is about 300-400 records with out 16 columns, so ideally I need to find some way to fill in the missing values. — Battleroid, Feb 08 '16 at 19:34
@bleh That's my original thought but I was wondering if Pandas might have a way to fill in the values in between. — Battleroid, Feb 08 '16 at 19:39
I see that now -- removing above comment because it's not helpful. — kennes, Feb 08 '16 at 19:40

score 11 · Accepted Answer · answered Feb 08 '16 at 19:37

11

You didn't say what should happen to your Index, so I'm assuming it's unimportant.

In [12]: df.index = df['value']

In [15]: df.reindex(np.arange(df.value.min(), df.value.max() + 1)).fillna(0)
Out[15]:
       value  freq
value
9          9     1
10         0     0
11        11     1
12        12     4
13         0     0
14         0     0
15        15     2

answered Feb 08 '16 at 19:37

TomAugspurger

28,234
8
86
69

Yes, the index is completely unimportant. This looks ideal, going to give it a shot. – Battleroid Feb 08 '16 at 19:41
Thank you, this is perfect! – Battleroid Feb 09 '16 at 10:28
1

I've also found you can drop the `.fillna(0)` portion and just use `df.reindex(np.arange(df.value.min(), df.value.max() + 1), fill_value=0)` seems to accomplish the same thing. – Battleroid Feb 09 '16 at 13:57

BERA · Answer 2 · 2022-08-30T08:26:07.970

Another option is to create a second dataframe with values from min to max, and outer join this to your dataframe:

import pandas as pd

a = pd.DataFrame({'value':[9,11,12,15], 'freq':[1,1,4,2]})
#   value  freq
#0      9     1
#1     11     1
#2     12     4
#3     15     2

b = pd.DataFrame({'value':[x for x in range(a.value.min(), a.value.max()+1)]})
   value
0      9
1     10
2     11
3     12
4     13
5     14
6     15

a = pd.merge(left=a, right=b, on='value', how='outer').fillna(0).sort_values(by='value')

#   value  freq
#0      9   1.0
#4     10   0.0
#1     11   1.0
#2     12   4.0
#5     13   0.0
#6     14   0.0
#3     15   2.0

How can I fill in a missing values in range with Pandas?

2 Answers2

Linked

Related