2

I've a question regarding some code in python. I'm trying to extract the index of the first row when the condition TRUE is satisfied in 3 different columns. This is the data I'm using:

       0       1       2             3           4
0   TRUE    TRUE    TRUE    0.41871395  0.492517879
1   TRUE    TRUE    TRUE    0.409863582 0.519425031
2   TRUE    TRUE    TRUE    0.390077415 0.593127232
3   FALSE   FALSE   FALSE   0.372020631 0.704367199
4   FALSE   FALSE   FALSE   0.373546556 0.810876797
5   FALSE   FALSE   FALSE   0.398876919 0.86855678
6   FALSE   FALSE   FALSE   0.432142094 0.875576037
7   FALSE   FALSE   FALSE   0.454115421 0.863063448
8   FALSE   TRUE    FALSE   0.460676901 0.855739006
9   FALSE   TRUE    FALSE   0.458693197 0.855128636
10  FALSE   FALSE   FALSE   0.459201839 0.856451104
11  FALSE   FALSE   FALSE   0.458693197 0.855739006
12  FALSE   FALSE   FALSE   0.458082827 0.856349376
13  FALSE   FALSE   FALSE   0.456556902 0.856959746
14  TRUE    TRUE    TRUE    0.455946532 0.858180486
15  TRUE    TRUE    TRUE    0.455030976 0.858790857
16  TRUE    TRUE    TRUE    0.454725791 0.858485672
17  FALSE   FALSE   FALSE   0.454420606 0.857875301
18  FALSE   FALSE   FALSE   0.454725791 0.858383943
19  FALSE   TRUE    FALSE   0.453199866 0.856654561
20  FALSE   FALSE   FALSE   0.451979125 0.856349376
21  FALSE   FALSE   FALSE   0.45167394  0.856959746
22  FALSE   FALSE   FALSE   0.451775669 0.857570116
23  FALSE   FALSE   FALSE   0.45106357  0.857264931
24  TRUE    TRUE    TRUE    0.450758385 0.856654561
25  TRUE    TRUE    TRUE    0.4504532   0.856044191
26  TRUE    TRUE    TRUE    0.449232459 0.856349376
27  TRUE    TRUE    TRUE    0.448316904 0.855535549

and I need to get the index number only when there are 3 'True' conditions:

0   
14  
24

Thank you!

Gius
  • 513
  • 6
  • 15

7 Answers7

1

EDIT 2: I have a new answer in response to some clarifications.

You're looking for each row that has TRUE in columns 0, 1, or 2, BUT you'd like to ignore such rows that are not the first in a streak of them. The first part of my answer is still the same, I think you should create a mask that selects your TRUE triplet rows:

condition = df[[0, 1, 2]].all(axis='columns')

But now I present a possible way to filter out the rows you want to ignore. To be not-first in a streak of TRUE triplet rows means that the previous row also satisfies condition.

idx = df[condition].index
ignore = idx.isin(idx + 1)
result = idx[~ignore]

In other words, ignore rows where the index value is the successor of an index value satisfying condition.

Hope this helps!


Keeping my original answer for record keeping:

I think you'll end up with the most readable solution by breaking this out into two steps:

First, find out which rows have the value True for all of the columns you're interested in:

condition = df[[0, 1, 2]].all(axis='columns')

Then, the index values you're interested in are simply df[condition].index.

EDIT: if, as Benoit points out may be the case, TRUE and FALSE are strings, that's fine, you just need a minor tweak to the first step:

condition = (df[[0, 1, 2]] == 'TRUE').all(axis='columns')
wbadart
  • 2,583
  • 1
  • 13
  • 27
  • This does not generate the expected output : `[0, 14, 24]` – Ricky Kim Aug 21 '19 at 16:57
  • 1
    I think I, and the other answerers, are confused because your explanation doesn't line up with your expected answer. You say "I'm trying to extract the index of the first row when the condition TRUE is satisfied in 3 different columns", in which case you could take the `head` of `df[condition]` for your answer. I see now that what you *really* want is the first row of each streak of `TRUE` triplet rows. I'll workshop my response accordingly. – wbadart Aug 21 '19 at 17:45
  • @RickyKim see EDIT2 – wbadart Aug 21 '19 at 18:01
1

I guess everyone missed the "extract the index of the first row" part. One of the way would be removing consecutive duplicates first and then obtaining index where all three is True so that you only get first row of the truth

df=df[['0', '1', '2']]
df=df[df.shift()!=df].dropna().all(axis=1)
print(df[df].index.tolist())

OUTPUT:

[0, 14, 24]
Ricky Kim
  • 1,992
  • 1
  • 9
  • 18
1

I tried this on a demo dataframe and it seems to work for me.



df = pd.DataFrame(data={'A':[True,True,True,True,True,False,True,True],'B':[True,True,False,True,True,False,True,True],'C':[True,False,True,True,True,False,True,True]})
i =df[(df['A']==True) & (df['B']==True) & (df['C']==True)].index.to_list()

i = [x for x in i if x-1 not in i]

Parijat Bhatt
  • 664
  • 4
  • 6
1

If the TRUE and FALSE in your DataFrame are actually the boolean values True and False then,

#This will look at the first 3 columns and return True if "all" are True else it will return False:

step1 = [all(q) for q in df[[0,1,2]].values]

id = []
cnt = 0
temp_cnt = 0

#this loop finds where the value is true and checks if the next 2 are also true 
#it then appends the count-2 to a list named id, the -2 compensates for the index.
for q in step1:
    if q:
        cnt += 1
        if cnt == 3:
            id.append(temp_cnt - 2)
    else:
        cnt = 0

    temp_cnt += 1


#Then when printing "id" it will return the first index where AT LEAST 3 True values occur in sequence.
id
Out[108]: [0, 14, 24]
J91
  • 33
  • 6
1

I think this could do the trick. As a general advice though, it always helps to name the columns in pandas.

Say that your pandas data frame is named data:

data[(data[0] == True) & (data[1] == True) & (data[2] == True)].index.values

or

list(data[(data[0] == True) & (data[1] == True) & (data[2] == True)].index.values)
ats
  • 141
  • 6
0

The following will work regardless of the position of the 3 columns you wish to check for True values, and gives you back a list indicating which rows have 3 True values present:

Edit:

Now updated to better align with the OP's original request:

#df.iloc[:,:3] = df.iloc[:,:3].apply(lambda x: str(x) == "TRUE")    # If necessary
s = (df == True).apply(sum, axis=1) == 3
s = s[s.shift() != s]
s.index[s].tolist()
Ted
  • 1,189
  • 8
  • 15
  • There are some more elegant answers here, certainly, but any downvoters care to comment? – Ted Aug 21 '19 at 16:04
  • What if there are other boolean columns? I believe your answer is way too coupled to the specific given dataframe – Itay Aug 21 '19 at 16:26
  • @Itay OP's original request was "when the condition TRUE is satisfied in 3 different columns." Although one could infer from the given data that the first 3 columns were what was meant, it is equally likely that the question was a general one about a condition being `True` in any 3 columns per row in a given dataframe. – Ted Aug 21 '19 at 16:29
  • 1
    Okay, you convinced me. – Itay Aug 21 '19 at 16:31
  • This does not generate the expected output : `[0, 14, 24]` – Ricky Kim Aug 21 '19 at 16:57
  • I agree with @wbadart that the original explanation didn't really line up with the expected output. I've updated my answer accordingly. – Ted Aug 22 '19 at 07:17
0

Based on the answer here, something like this will provide a list of indices for the rows that meet all conditions:

df[(df[0]==True) & (df[1]==True) & (df[2]==True)].index.tolist()

Korean_Of_the_Mountain
  • 1,428
  • 3
  • 16
  • 40
  • This does not generate the expected output : `[0, 14, 24]` – Ricky Kim Aug 21 '19 at 16:57
  • @RickyKim has the right answer. I totally skipped over the part where OP only wanted the index of first group of rows that met conditions. I tried to think of a way without using `shift()` to be more original, but couldn't come up with anything outside of inefficiently looping over each row. – Korean_Of_the_Mountain Aug 21 '19 at 18:03