0

I have timestamped data in a csv with a resolution of 10 minutes, as shown in this image. Each timestamp has an "id" which is either 1 or 0. I am trying to create a new column, "time", which returns 0 if "id" = 0, otherwise it should return the value of "time" in the next row plus 10. I am able to do this easily on Excel, and the formula I used is shown in the image.

I am trying to use Python to do the same thing. After importing the csv data as a dataframe using pandas, I had a look at this link and this link. I copied the "id" column to a new column, "time" and used the following code, but the values of "time" remain exactly the same (1 or 0).

for i, e in enumerate(df['time']):
    if e==1:
        df.at[i,'time']=df.at[i+1,'time']+10

Based on the solution in the first link, I tried changing the value of '1' to null and using if pd.isnull(e), but ended up getting TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'.

I attempted it again by ordering the rows in reverse, changing i+1 to i-1 but it didn't work either, and I was getting KeyError: -1.

Does anyone have any suggestions on how I can do this in Python?

Edit:

Apologies, I probably didn't explain my question well. I've figured out how to get the solution I wanted. I realised that the problem was due to the dtypes for "id" and "time" were both object. The KeyError: -1 problem was solved by ordering the rows in reverse and assigning a value to "time" in the very first row if it doesn't equal 0. I'm going to ignore any row before the first row with "id" of 0, so I used 999999.

Solution:

#sort timestamp in reverse
df=df.sort_values(by="timestamp",ascending=False)

#change dtype object to int64
df['id']=df['id'].astype(np.int64)
df['time']=df['id']

#assigning value to "time" in first row
if df.loc[0,'time']==0:
    df.set_value(0,'time',0)
else:
    df.set_value(0,'time',999999)

#using the previous row's value to evaluate "time"
for i, e in enumerate(df['time']):
    if e==1:
        df.at[i,'time']=df.at[i-1,'time']+10

#sort timestamp back in ascending order
df=df.sort_values(by="timestamp")
nms
  • 60
  • 8

1 Answers1

0

This should do the trick, but you didn't specify how you want to handle NaN values which will happen in your question

import pandas as pd

df = pd.DataFrame({'id':[1,1,0,0,1,0,1], 'nums':[10,20,50,10,10,30,20]})


   id  nums
0   1    10
1   1    20
2   0    50
3   0    10
4   1    10
5   0    30
6   1    20


df['nums'] = pd.concat([df, df['nums'].shift(-1)], axis=1).apply(lambda x: 0 if x['id'] == 0 else x[[2]]+10, axis=1)

   id  nums
0   1  30.0
1   1  60.0
2   0   0.0
3   0   0.0
4   1  40.0
5   0   0.0
6   1   NaN

You can just add a fillna(0) to the end of the one-liner if you want to fill the NaN values.

gold_cy
  • 13,648
  • 3
  • 23
  • 45
  • Hi Dmitry. Thanks for the suggestion. Apologies, I don't think I have explained my question well and this wasn't exactly what I was looking for. However, your solution did help me think about my method and allowed me to come up with a solution, which I have posted as an edit :) – nms Jun 19 '17 at 15:42