Pandas Dataframe fill column with sequence_id based on multiple columns ids and timestamp

Question

*Im editing the df given it contained a typo in ne1_id

having a really hard time trying to solve the following, ill really much appreciate any assistance or light with the following I have a DataFrame df that looks like this:

	timestamp	user_id	ne1_id.	ne2_id.	attempt_no
0	18:11:42.838363	1	100		1
1	18:11:42.838364		100	123456
2	18:11:42.838365		100	123456
3	18:11:42.83836		100	123456
4	18:11:45.838365	1	100		2
5	18:11:45.838366		100	321234
6	18:11:45.838369		100	321234
7	18:11:46.838363	3	12		3
8	18:11:46.838364		12	9832
9	18:11:47.838363	2	12		4
10	18:11:47.838369		100

What I want to do is to fill the attempt_no of the empty cells (empties are empties not NaN) for the next rows based on timestamp (or index) with the proper attempt_no by associating user_id, ne1_id, ne2_id associations, I im not seeing the logic of it neither the way of do it.

the result should be something like this

	timestamp	user_id	ne1_id.	ne2_id.	attempt_no
0	18:11:42.838363	1	100		1
1	18:11:42.838364		100	123456	1
2	18:11:42.838365		100	123456
3	18:11:42.838369		100	123456
4	18:11:45.838365	1	100		2
5	18:11:45.838366		100	321234	2
6	18:11:45.838369		100	321234
7	18:11:46.838363	3	12		3
8	18:11:46.838364		12	9832	3
9	18:11:47.838363	2	12		4
10	18:11:47.838369		100		4

something that says the following: "find all the rows where there is a user_id and find the next row with the same ne1_id with an empty user_id and attemp_no and fill atppemp_no with the attemp_no of the previous row" i tried with groupby -that i believe is the way of do it-, but kind of stuck there

i appreciate any suggestion.

df.attempt_no.mask(df.attempt_no.eq('')).fillna(method='ffill')?? — Nk03, May 30 '21 at 15:53
You haven't defined how attempts are associated. Currently it looks like forward fill attempt_no and reset the index. It's also unclear if those are spaces or NaN in the columns. Please provide your dataframe as a _copyable_ piece of code. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for more information. — Henry Ecker, May 30 '21 at 16:45
@HenryEcker thanks, i just edited with that context. those are just spaces, not NaN. Either way i hand shared the proper ne_id2 (network element#2), that eventually would be needed to associate all the columns with the proper attemp_no that i need. — jpbrunori, May 30 '21 at 23:28

rudolfovic · Answer 1 · 2021-05-30T16:15:36.423

0

def f(x):
    last = None
    for i in range(len(x)):
        if np.isnan(x[i]):
            x[i] = last
        else:
            last = x[i]
    return x

df = pd.DataFrame({'x': [1, None, None, 2, None, None, None, 3, None]})
df[['x']].apply(f)

By applying the function on axis=0 you are able to jointly process the entire column.

edited May 30 '21 at 16:15

answered May 30 '21 at 16:02

rudolfovic

3,163
2
14
38

Pandas Dataframe fill column with sequence_id based on multiple columns ids and timestamp

1 Answers1