I have a local dataframe that gets appended with new entries daily. Once in a while, an old entry is updated. The give away is a bunch of columns will match, but the timestamp is more recent.
With the goal of removing the old entry, and keeping the new (updated) entry, I append the new entry and then "clean" the dataframe by looping through the rows and finding the old entry:
del_rows=[]
df2 = df.copy()
for index, row in df.iterrows():
for index2, row2 in df2.iterrows():
if row["crit1"]==row2["crit1"] and row["date"] > row2["date"]:
del_rows.append(index2)
df = df.drop(df.index[del_rows])
While functional, I'd love to know the more "pandas" way of going about this process. I know that apply
and NumPy vectorization are faster; however, I can't think of a function that would achieve this that I could map apply
to, or a way to use the vectorization given different data types.