0

I have a dataframe, where one of the column is an integer id, and there are two other columns start and end which have floating point representation.

Some end values are nan. I want to set these end values that are nan to start for certain id, e.g., id == 1.

Here's an example:

df = pd.DataFrame({
"id": [0, 1, 1, 2, 2],
"start": [1.1, 1.2, 1.3, 1.4, 1.5],
"end": [1.1, float("nan"), 1.3, 1.4, float("nan")]
})

Afterwards, it should be

df = pd.DataFrame({
"id": [0, 1, 1, 2, 2],
"start": [1.1, 1.2, 1.3, 1.4, 1.5],
"end": [1.1, 1.2, 1.3, 1.4, float("nan")]
})
jinkins
  • 25
  • 1
  • 6

2 Answers2

1

Main trick is to get a boolean index that selects when you have the desired ID and has a nan value. We use .isin() for multiple ID selection and isnull() to check for NaN values.

df = pd.DataFrame(
    {
        "id": [0, 1, 1, 2, 2],
        "start": [1.1, 1.2, 1.3, 1.4, 1.5],
        "end": [1.1, float("nan"), 1.3, 1.4, float("nan")],
    }
)

# For multiple id numbers
ids = [1] 

indexer = df["id"].isin(ids) & df["end"].isnull()

df.loc[indexer, "end"] = df.loc[indexer, "start"]
Michael Cao
  • 2,278
  • 1
  • 1
  • 13
0

tl;dr df.end = df.end.fillna(df.start[df.id==1])

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(
   ...:     {
   ...:         "id": [0, 1, 1, 2, 2],
   ...:         "start": [1.1, 1.2, 1.3, 1.4, 1.5],
   ...:         "end": [1.1, float("nan"), 1.3, 1.4, float("nan")],
   ...:     }
   ...: )

In [3]: df.end = df.end.fillna(df.start[df.id==1])

In [4]: df
Out[4]:
   id  start  end
0   0    1.1  1.1
1   1    1.2  1.2
2   1    1.3  1.3
3   2    1.4  1.4
4   2    1.5  NaN