2

I have a dataset that looks like (with more columns and rows):

    id  type      value
0  104     0       7999
1  105     1  196193579
2  108     0     245744
3  NaN     1        NaN

Some rows have NaN values, and I already have the indexes for these rows. Now I would like to concat these rows with their previous row, except for NaN values. If I say indexes=[3], then the new dataframe should be:

    id  type      value
0  104     0       7999
1  105     1  196193579
2  108    01     245744

How can I do this?

NOTE: First row never will be in the list of indexes I have. The solution must be given my list of indexes. I also know the names of the columns where NaN values are, if necessary.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Learning from masters
  • 2,032
  • 3
  • 29
  • 42
  • I think you're looking for `ffill()` or `fillna()` – Mark Aug 18 '23 at 15:40
  • Does this answer your question? [Filling missing values using forward and backward fill in pandas dataframe (ffill and bfill)](https://stackoverflow.com/questions/41589365/filling-missing-values-using-forward-and-backward-fill-in-pandas-dataframe-ffil) – Mark Aug 18 '23 at 15:42
  • 1
    Please provide your input as DataFrame constructor for clarity and extend the example with a few more rows. – mozway Aug 18 '23 at 15:42
  • Are the NaNs always in both `id` and `value`? Then why not just ignore them? And what dtypes are the columns? str? Please provide a [mre], like @mozway's saying. For specifics see [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Aug 18 '23 at 16:00
  • @Mark I don't want to fill these lines. I want to remove, and keep what is not nan in the previous row, joined with the data already have that line – Learning from masters Aug 21 '23 at 11:24
  • @wjandrea I want to remove those lines right, but also put what is not none on the previous rows, joined, as I already explained. – Learning from masters Aug 21 '23 at 11:24

2 Answers2

2

A possible solution:

idx = 3

aux = [pd.Series([x, y]).dropna().astype(int).astype(str).sum()
                     for x, y in zip(df.iloc[idx-1, :], df.iloc[idx, :])]

df = df.drop(idx).astype(int)
df.iloc[idx-1, :] = aux

Output:

    id type      value
0  104    0       7999
1  105    1  196193579
2  108   01     245744
PaulS
  • 21,159
  • 2
  • 9
  • 26
2

If you have an external list of indices to merge with the rows above you can use:

indexes=[3]

out = (df
   .astype({'type': str})
   .groupby((~df.index.to_series().isin(indexes)).cumsum())
   .agg({'id': 'first', 'type': ''.join, 'value': 'first'})
 )

If you have many columns build the aggregation dictionary programmatically:

indexes=[3]
d = {k: ''.join if t == object else 'first'
     for k, t in df.dtypes.items()}

out = (df
   .astype({'type': str})
   .groupby((~df.index.to_series().isin(indexes)).cumsum())
   .agg(d)
 )

Output:

      id type        value
1  104.0    0       7999.0
2  105.0    1  196193579.0
3  108.0   01     245744.0
mozway
  • 194,879
  • 13
  • 39
  • 75