0

I have a set of time series data where I am trying to find at what point they reach a specific value. I tried this, that gets me the answer but produces the whole column as opposed to the specific value in the time column and I am unsure how to iterate over several columns.

num = df['Time column'].where(df['Data Column'] >Value of interest)
print (num)

Any help is greatly appreciated.

cottontail
  • 10,268
  • 18
  • 50
  • 51
David
  • 1
  • 1
  • you probably want to use .loc instead of .where see: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html – n4321d Oct 19 '22 at 16:31
  • Hello. Can you share examples with us how your data roughly looks like, what is the output you get and how an ideal output should look like. – Daraan Oct 19 '22 at 18:23

2 Answers2

0

With DataFrame.loc you can find the rows that satisfy your condition. Check the example below:

mock_time_series.csv

time,v1,v2     
100,10,20
200,20,22
300,14,12
400,33,22
500,43,62

df = pd.read_csv('mock_time_series.csv')

By using .loc:

df.loc[df['v1'] >= 20]['time'].values

You will get as a response:

array([200, 400, 500])

As well, for the 'v2' column:

df.loc[df['v2'] >= 20]['time'].values

The respective time list is as follows:

array([100, 200, 400, 500])

Then, you can get the intersection of both lists if you want the list of common intersections (list intersection, from this response):

# intersection of lists list(set(a) & set(b))
list(set(df.loc[df['v1'] >= 20]['time'].values) & set(df.loc[df['v2'] >= 20]['time'].values))

The resulting list will be as follows:

[200, 500, 400]
mdsw
  • 31
  • 3
0

Thank you very much for the help. This got on the right path. This does put it out as an array so not the cleanest way to do this, but easy enough to see which lines reach the point and when:

df = pd.read_excel("file location")
for column_name in df:
    array = df.loc[df[column_name] >= 10]['Time'].values
    first_number = array[:1]
    print(column_name, first_number)
David
  • 1
  • 1