0

I have a 2 columns dataframe. The first column indicates a timestamp. The second column has values calculated (the first few rows have no value on purpose).

df = pd.DataFrame({'timestamp': [123, 456, 789, 101112, 131415, 161718, 192021, 222324, 252627, 282930, 313233, 343536], 
                   'value': ["", "", "", 68, 47, 62, 62, 62, 62, 54, 54 , 54]})

I am trying to iterate through the 'value' in order to create a third colum that indicates "direction".

if 'value' is empty then direction is empty

if 'value' is > previous_value then direction = "positive"

if 'value' is < previous_value then direction = "negative"

if 'value" = previous_value AND previous_direction is "positive" then direction = "positive"

if 'value" = previous_value AND previous_direction is "negative" then direction = "negative"

The problem that I have is that I can't inspect each colum simultaneously in what I have tried to far.

I have tried looping with something like that

for i in range(len(df)):
    if i == 0 :
        if df.iloc[i-1,'value'] > df.iloc[i,'value']:
           df.iloc[i,'direction'] = "positive"

I get an error. ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]

I have tried looping with something like that also

for i in range(1, len(df) + 1):
    col = 'value'
    j = df.columns.get_loc('direction')
    if (df[col] > (df[col].shift(1))):
        df.loc[i - 1, j] = "positive"
    elif (df[col] < (df[col].shift(1))):
        df.loc[i - 1, j] = "negative"

I get another ValueError ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have tried working with np.where - I don't get errors, but I don't get what I am trying to do either.

df['positive']   = np.where( (df['value'] > df['value'].shift(1)) , "1",  "")
df['negative'] = np.where( (df['value'] < df['value'].shift(1)) , "1", "")

df['positive2']   = np.where( (df['positive'].shift(1)   =="1")  & (df['value'] == df['value'].shift(1)), "1", "0")
df['negative2'] = np.where( (df['negative'].shift(1) =="1") & (df['value'] == df['value'].shift(1)), "1","0")

Anyone has an idea?

chouchou
  • 23
  • 3
  • Does this answer your question? [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) – Siddhant Kadwe Feb 17 '23 at 13:40

1 Answers1

1

You can try using iterrows method for the dataframe.

Here is the sample code for the problem you are trying to solver:

import pandas as pd

df = pd.DataFrame({'timestamp': [123, 456, 789, 101112, 131415, 161718, 192021, 222324, 252627, 282930, 313233, 343536], 
                    'value': ["", "", "", 68, 47, 62, 62, 62, 62, 54, 54 , 54]})

df['value'] = pd.to_numeric(df['value'], errors='coerce').fillna(0)

df['direction'] = ""

for index, row in df.iterrows():

# check if the current value is empty
if not row['value']:
    df.loc[index, 'direction'] = ""
    continue
  
if index != 0:
    # get the previous value and direction
    previous_value = df.loc[index - 1, 'value']
    previous_direction = df.loc[index - 1, 'direction']
    # check the conditions and set the direction
    if row['value'] > previous_value:
        df.loc[index, 'direction'] = "positive"
    elif row['value'] < previous_value:
        df.loc[index, 'direction'] = "negative"
    elif row['value'] == previous_value and previous_direction == "positive":
        df.loc[index, 'direction'] = "positive"
    elif row['value'] == previous_value and previous_direction == "negative":
        df.loc[index, 'direction'] = "negative"
else:
    df.loc[index, 'direction'] = "positive"

print(df)
    

With this, you can easily iterate though the complete dataframe as shown above.

You can find more information here: pandas.DataFrame.iterrows

I hope this helped!