0

I have a dataframe that looks like this:

data = [[1, 10,100], [1.5, 15, 25], [7, 14, 70], [33,44,55]]
df = pd.DataFrame(data, columns = ['A', 'B','C'])

And has a visual expression like this

A    B    C
1    10   100
1.5  15   25
7    14   70
33   44   55

I have other data, that is a random subset of rows from the dataframe, so something like this

set_of_rows = [[1,10,100], [33,44,55]]

I want to get the indeces indicating the location of each row in set_of_rows inside df. So I need a function that does something like this:

indeces = func(subset=set_of_rows, dataframe=df)
In [1]: print(indeces)
Out[1]: [0, 3]

What function can do this? Tnx

NeStack
  • 1,739
  • 1
  • 20
  • 40
  • Are you trying to lookup values and return indices? – The Singularity Aug 03 '21 at 11:21
  • @Luke Yes, something like this. As I wrote I am trying to find where (at what indeces) many rows are located within a dataframe – NeStack Aug 03 '21 at 11:23
  • 1
    Take a look at [this post](https://stackoverflow.com/questions/38674027/find-the-row-indexes-of-several-values-in-a-numpy-array) – SeaBean Aug 03 '21 at 12:06
  • @SeaBean Thanks, this helps! And it is similar to the answer by IoaTzimas, so I think I will just use his suggestion – NeStack Aug 03 '21 at 16:57

2 Answers2

1

Try the following:

[i for i in df.index if df.loc[i].to_list() in set_of_rows]
#[0, 3]

If you want it as a function:

def func(set_of_rows, df):
    return [i for i in df.index if df.loc[i].to_list() in set_of_rows]
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30
  • 1
    Thanks! Isn't there an intrinsic pandas function to that? Your answer will work, but it looks like it might be slower than a pandas function built specially for my task. And also your solution will not handle well a case of a mistake, e.g. if I have erroneously as a row in `set_of_rows` something like `[1, 10, abc]` your solution will just skip my mistake. Which is fine, but it would be better if it would give me `NaN` or something like this. BTW, someone downvoted your answer and it wasn't me – NeStack Aug 03 '21 at 11:31
1

You can check this thread out; Python Pandas: Get index of rows which column matches certain value

As far as I know, there is no intrinsic Panda function for your task so iteration is the only way to go about it. If you are concerned about dealing with the errors, you can add conditions in your loop that will take care of that.

for i in df.index:
    lst = df.loc[i].to_list()
    if lst in set_of_rows:
       return i
    else:
       return None
Tasbiha
  • 106
  • 5