I have the following working code that sets 1 to "new_col" at the locations pointed by intervals dictated by starts
and ends
.
import pandas as pd
import numpy as np
df = pd.DataFrame({"a": np.arange(10)})
starts = [1, 5, 8]
ends = [1, 6, 10]
value = 1
df["new_col"] = 0
for s, e in zip(starts, ends):
df.loc[s:e, "new_col"] = value
print(df)
a new_col 0 0 0 1 1 1 2 2 0 3 3 0 4 4 0 5 5 1 6 6 1 7 7 0 8 8 1 9 9 1
I want these intervals to come from another dataframe pointer_df
.
How to vectorize this?
pointer_df = pd.DataFrame({"starts": starts, "ends": ends})
Attempt:
df.loc[pointer_df["starts"]:pointer_df["ends"], "new_col"] = 2
print(df)
obviously doesn't work and gives
raise AssertionError("Start slice bound is non-scalar") AssertionError: Start slice bound is non-scalar
EDIT:
it seems all answers use some kind of pythonic for loop.
the question was how to vectorize the operation above?
Is this not doable without for loops/list comprehentions?