How to concat two rows by index in a dataframe except for nan values?

Question

I have a dataset that looks like (with more columns and rows):

    id  type      value
0  104     0       7999
1  105     1  196193579
2  108     0     245744
3  NaN     1        NaN

Some rows have NaN values, and I already have the indexes for these rows. Now I would like to concat these rows with their previous row, except for NaN values. If I say indexes=[3], then the new dataframe should be:

    id  type      value
0  104     0       7999
1  105     1  196193579
2  108    01     245744

How can I do this?

NOTE: First row never will be in the list of indexes I have. The solution must be given my list of indexes. I also know the names of the columns where NaN values are, if necessary.

Does this answer your question? [Filling missing values using forward and backward fill in pandas dataframe (ffill and bfill)](https://stackoverflow.com/questions/41589365/filling-missing-values-using-forward-and-backward-fill-in-pandas-dataframe-ffil) — Mark, Aug 18 '23 at 15:42
Please provide your input as DataFrame constructor for clarity and extend the example with a few more rows. — mozway, Aug 18 '23 at 15:42
Are the NaNs always in both `id` and `value`? Then why not just ignore them? And what dtypes are the columns? str? Please provide a [mre], like @mozway's saying. For specifics see [How to make good reproducible pandas examples](/q/20109391/4518341). — wjandrea, Aug 18 '23 at 16:00
@Mark I don't want to fill these lines. I want to remove, and keep what is not nan in the previous row, joined with the data already have that line — Learning from masters, Aug 21 '23 at 11:24
@wjandrea I want to remove those lines right, but also put what is not none on the previous rows, joined, as I already explained. — Learning from masters, Aug 21 '23 at 11:24

PaulS · Answer 1 · 2023-08-18T16:14:56.100

2

A possible solution:

idx = 3

aux = [pd.Series([x, y]).dropna().astype(int).astype(str).sum()
                     for x, y in zip(df.iloc[idx-1, :], df.iloc[idx, :])]

df = df.drop(idx).astype(int)
df.iloc[idx-1, :] = aux

Output:

    id type      value
0  104    0       7999
1  105    1  196193579
2  108   01     245744

edited Aug 18 '23 at 16:14

answered Aug 18 '23 at 15:55

PaulS

21,159
2
9
26

mozway · Accepted Answer · 2023-08-18T16:15:06.630

If you have an external list of indices to merge with the rows above you can use:

indexes=[3]

out = (df
   .astype({'type': str})
   .groupby((~df.index.to_series().isin(indexes)).cumsum())
   .agg({'id': 'first', 'type': ''.join, 'value': 'first'})
 )

If you have many columns build the aggregation dictionary programmatically:

indexes=[3]
d = {k: ''.join if t == object else 'first'
     for k, t in df.dtypes.items()}

out = (df
   .astype({'type': str})
   .groupby((~df.index.to_series().isin(indexes)).cumsum())
   .agg(d)
 )

Output:

      id type        value
1  104.0    0       7999.0
2  105.0    1  196193579.0
3  108.0   01     245744.0

How to concat two rows by index in a dataframe except for nan values?

2 Answers2