I have timestamped data in a csv with a resolution of 10 minutes, as shown in this image. Each timestamp has an "id" which is either 1 or 0. I am trying to create a new column, "time", which returns 0 if "id" = 0, otherwise it should return the value of "time" in the next row plus 10. I am able to do this easily on Excel, and the formula I used is shown in the image.
I am trying to use Python to do the same thing. After importing the csv data as a dataframe using pandas, I had a look at this link and this link. I copied the "id" column to a new column, "time" and used the following code, but the values of "time" remain exactly the same (1 or 0).
for i, e in enumerate(df['time']):
if e==1:
df.at[i,'time']=df.at[i+1,'time']+10
Based on the solution in the first link, I tried changing the value of '1' to null and using if pd.isnull(e)
, but ended up getting TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
.
I attempted it again by ordering the rows in reverse, changing i+1
to i-1
but it didn't work either, and I was getting KeyError: -1
.
Does anyone have any suggestions on how I can do this in Python?
Edit:
Apologies, I probably didn't explain my question well. I've figured out how to get the solution I wanted. I realised that the problem was due to the dtypes for "id" and "time" were both object
. The KeyError: -1
problem was solved by ordering the rows in reverse and assigning a value to "time" in the very first row if it doesn't equal 0. I'm going to ignore any row before the first row with "id" of 0, so I used 999999.
Solution:
#sort timestamp in reverse
df=df.sort_values(by="timestamp",ascending=False)
#change dtype object to int64
df['id']=df['id'].astype(np.int64)
df['time']=df['id']
#assigning value to "time" in first row
if df.loc[0,'time']==0:
df.set_value(0,'time',0)
else:
df.set_value(0,'time',999999)
#using the previous row's value to evaluate "time"
for i, e in enumerate(df['time']):
if e==1:
df.at[i,'time']=df.at[i-1,'time']+10
#sort timestamp back in ascending order
df=df.sort_values(by="timestamp")