0

Lets assume we have a panda dataframes with three features as represented below.

enter image description here

Each rows is representing a customer and each column representing some features of this customer.

I would like to get row number and add them to a list or not add them to list according to their feature values.

Lets say, we would like to find row numbers if FEATUREA less than 100 or FEATUREB more than 500.

I have written some code for this as you can see below.

import pandas as pd

d = [{'feature1': 100, 'feature2': 520, 'feature3': 54},
     {'feature1': 102, 'feature2': 504, 'feature3': 51},
     {'feature1': 241, 'feature2': 124, 'feature3': 4},
     {'feature1': 340, 'feature2': 830, 'feature3': 700},
     {'feature1': 98, 'feature2': 430, 'feature3': 123}]

df = DataFrame(d)
print(df)
print("----")

dataframe1 = df[(df['feature1'] < 100)]
dataframe2 = df[(df['feature2'] > 500)]

print(dataframe1)
print(dataframe2)
# here I would like to get row number temp and add them to result list

Output of the program

      feature1  feature2  feature3
0       100       520        54
1       102       504        51
2       241       124         4
3       340       830       700
4        98       430       123
----
   feature1  feature2  feature3
4        98       430       123
   feature1  feature2  feature3
0       100       520        54
1       102       504        51
3       340       830       700

I could not figure out how to combine dataframe1 and dataframe2 and then get theirs row number. Could you please share if you know how to do it?

I would like to see a result list like that

result = [ 4, 0, 1, 3]
clockworks
  • 3,755
  • 5
  • 37
  • 46
  • Possible duplicate of [Python Pandas: Get index of rows which column matches certain value](https://stackoverflow.com/questions/21800169/python-pandas-get-index-of-rows-which-column-matches-certain-value) – Mako212 Sep 15 '17 at 21:31
  • Per the above link: `df.index[(df['feature2'] > 500 )].tolist()` – Mako212 Sep 15 '17 at 21:37
  • I have edited the question. I am also trying to figure out how to combine these two dataset. – clockworks Sep 15 '17 at 21:44

3 Answers3

1

Not real clear... but maybe this:

df.query('feature1 < 100 | feature2 > 500').index.tolist()

[0, 1, 3, 4]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

How about like this?

ls = []

ls.extend(df.index[(df['feature1'] < 100 )])
ls.extend(df.index[(df['feature2'] > 500 )])

print(ls)
[4, 0, 1, 3]
Mako212
  • 6,787
  • 1
  • 18
  • 37
0

You want to output the index as a list.

print(df[df['feature2'] > 500].index.tolist())

[0, 1, 3]
Shawn Mehan
  • 4,513
  • 9
  • 31
  • 51