1

I have a dataframe df

import pandas as pd
s = {'id': [243,243, 243, 243, 443,443,443],
 'st': [1,3,5,9,12, 18,23],
 'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1]}
df = pd.DataFrame(s)

which looks like:

    id  st  value
0  243   1    2.4
1  243   3    3.8
2  243   5    3.7
3  243   9    5.6
4  443  12    1.2
5  443  18    0.2
6  443  23    2.1

I need to fill in the missing rows based on the values in st, the values in value and id column should be copied from the previous record. My output should look something like

    id  st  value
   243   1    2.4
   243   2    2.4
   243   3    3.8
   243   4    3.8
   243   5    3.7
   243   6    3.7
   243   7    3.7
   243   8    3.7
   243   9    5.6
   243  10    5.6
   243  11    5.6
   443  12    1.2
   443  13    1.2

and so on.

How can I do this in pandas dataframe ?

Here I am trying to fill the missing records rather than filling just the missing values.

Archit
  • 542
  • 1
  • 4
  • 15
  • Possible duplicate of [Fill in missing pandas data with previous non-missing value, grouped by key](https://stackoverflow.com/questions/16345583/fill-in-missing-pandas-data-with-previous-non-missing-value-grouped-by-key) – Asmus May 06 '19 at 07:35
  • no that question is about filling missing values, my question is about filling the missing records. – Archit May 06 '19 at 07:44

1 Answers1

0

If values in column st are unique use DataFrame.reindex with forward filling missing values by ffill:

c = np.arange(df['st'].min(), df['st'].max()+1)
df1 = (df.set_index('st')
        .reindex(c)
        .ffill()
        .reset_index()
)

df1['id'] = df1['id'].astype(int)
print (df1)
    st   id  value
0    1  243    2.4
1    2  243    2.4
2    3  243    3.8
3    4  243    3.8
4    5  243    3.7
5    6  243    3.7
6    7  243    3.7
7    8  243    3.7
8    9  243    5.6
9   10  243    5.6
10  11  243    5.6
11  12  443    1.2
12  13  443    1.2
13  14  443    1.2
14  15  443    1.2
15  16  443    1.2
16  17  443    1.2
17  18  443    0.2
18  19  443    0.2
19  20  443    0.2
20  21  443    0.2
21  22  443    0.2
22  23  443    2.1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • But the id 243 should only be till st value 11 and not after, and then we should have value for id 443. as you can see in the expected output which I have put in the question. – Archit May 06 '19 at 07:41
  • @Archit - values in `st` are unique? If yes, solution was changed. If error `ValueError: cannot reindex from a duplicate axis` it means values are not unique, can you change sample data then? – jezrael May 06 '19 at 07:49
  • This might be a new question, but how can I put 0 in place of the value of the previous record. – Archit May 06 '19 at 13:05
  • @Archit - change `.reindex(c) .ffill()` to `.reindex(c, fill_value=0)` – jezrael May 06 '19 at 13:06
  • thank you , but as my dataframe have many groups on which same operation have to be done, it is throwing the error "ValueError: cannot reindex from a duplicate axis". how can I do it in when my st values are also repeating ? – Archit May 06 '19 at 13:16
  • the question I posted above was for doing the operation for a single group data but when I am adding data from all the groups, then in that case the st values are repeating from the minimum to maximum for each of the groups and hence are repeating. So how can I do the same operation as above on grouped data ? – Archit May 06 '19 at 13:19
  • @Archit - Is possible create new question? – jezrael May 06 '19 at 13:23