0

I have a dataframe of the following form where the index is a datetime value:

Date_Event| Col1 | Col2 | Col3

15/01/2017 | 0.55 | 0.23 | 0.75

15/02/2017 | 0.17 | 0.11 | 0.07

15/03/2017 | 0.78 | 0.93 | 0.98

15/04/2017 | 0.65 | 0.13 | 0.19

15/05/2017 | 0.20 | 0.40 | 0.70

15/06/2017 | 0.28 | 0.31 | 0.79

I would like to get the row (in short the date) with the minimum value corresponding in the columns so as to find values that are lower all prior points.

Date_Event| Col1 | Col2 | Col3

15/01/2017 | 0.55 | 0.23 | 0.75

15/02/2017 | 0.17 | 0.11 | 0.07

15/03/2017 | 0.78 | 0.93 | 0.98

15/04/2017 | 0.65 | 0.13 | 0.19

15/05/2017 | 0.20 | 0.40 | 0.70

15/06/2017 | 0.28 | 0.31 | 0.79

so that we get answer as -> 15/02/2017 since 0.17 was the least in column1, 0.11 least in column2, and 0.07 as least value in column 3.

My outside guess would be to probably use a Lambda function but I will leave it to you experts.

Thank you in advance.

Community
  • 1
  • 1
  • What have you tried so far? I see no code in your question, please do some research, try some code and then ask questions if you cant solve the problem. – DarkCygnus Jun 20 '17 at 23:59
  • I know a sure shot way which is to use df.cummin() but it kind of gives me the entire dataframe. I still need to find a range of date value where the values were at minimum. Also .cummin() gives me the cumulative minimum of each column independently. I would like to extract that row when the values were at minimum across all the data points relative to the other points in different columns. – Vaibhav Deorukhkar Jun 21 '17 at 01:06

1 Answers1

0

You can first convert your data frame file to a numpy ndarray as follows:

import numpy as np
data = np.genfromtxt(dataframe.txt, delimiter='|', dtype='S')

To obtain the index you can do:

initial_row = np.array([1, 1, 1]) # Initial values, arbitrary but large
for row in data:
    float_row = np.array(row[1:], dtype='float')
    if all(float_row < initial_row):
        initial_row = float_row
        index = row[0]
if all(initial_row == np.array([1,1,1])):
    index = 'None'

If index == 'None' then there is no row in your data with each column smaller than the rest.

rodgdor
  • 2,530
  • 1
  • 19
  • 26
  • I believe there is an indentation problem with your `else:`, couldnt edit it because it is not sufficiently long edit to be accepted – DarkCygnus Jun 21 '17 at 22:47
  • 1
    @GrayCygnus It's not a typo, the `else` indented at the level of the `for` loop is only 'passed' when the `if` statement is False for each iteration. See [this](https://stackoverflow.com/questions/9979970/why-does-python-use-else-after-for-and-while-loops) related question on the matter. – rodgdor Jun 21 '17 at 22:58
  • In this example, that `else` statement ensures that `index` will always have a value. If it is 'None' then no row of your data has the three columns smaller than the rest. Otherwise, index is equal to the 'date' of the minimal row. – rodgdor Jun 21 '17 at 23:00
  • Nice :) was not aware python supported that construct, thanks for the knowledge. However, checking the question you linked it is mentioned in a comment that the `else` is only relevant if you add a `break` inside the `for` loop when the desired column is found. Otherwise, it will always go to the `else` construct regardless you got a valid column. – DarkCygnus Jun 21 '17 at 23:35
  • Quoting that comment: "I think the real question many people have here is "What's the difference between for ... else foo() and just putting foo() after the for loop?" And the answer is that they behave differently **only if** the loop contains a **break** (as described in detail below). – Sam Kauffman" – DarkCygnus Jun 21 '17 at 23:37
  • @GrayCygnus You are totally right. My bad. I'll edit the code. Thanks for the info :) – rodgdor Jun 21 '17 at 23:52