a faster way than the isin function of Pandas to extract conditional rows

Question

I have a very large knowledge graph in pandas dataframe format as follows.

This dataframe KG has more than 100 million rows.

KG:

                   pred     subj      obj
        0   nationality     BART      USA
        1  placeOfBirth     BART  NEWYORK
        2     locatedIn  NEWYORK      USA
      ...           ...      ...      ...
116390740     hasFather     BART   HOMMER
116390741   nationality   HOMMER      USA
116390743  placeOfBirth   HOMMER  NEWYORK

I tried to get a row from this KG with a specific value for subj.

Using the subj column as a series, I tried to indexing the KG by generating a boolean series using isin() function as shown below.

KG[KG['subj'].isin(['BART', 'NEWYORK'])]

My desired output is

                   pred     subj      obj
        0   nationality     BART      USA
        1  placeOfBirth     BART  NEWYORK
        2     locatedIn  NEWYORK      USA
116390740     hasFather     BART   HOMMER

I have to repeat the above

But the above method takes a long time. Is there any way to reduce the time effectively than this method?

thanks!

Does this answer your question? [A faster alternative to Pandas \`isin\` function](https://stackoverflow.com/questions/23945493/a-faster-alternative-to-pandas-isin-function) — dm2, May 09 '21 at 09:52

Nk03 · Answer 1 · 2021-05-09T12:44:47.980

1

You can set/sort index and then pick the required values: Looking up rows based on index values is faster than looking up rows based on column values. It's faster when the index is sorted.

df = df.set_index('subj')
df = df.sort_index()
result = df.loc[['BART', 'NEWYORK']]

You can try query after setting multiindex:

df = df.set_index(['subj','obj'])
df = df.sort_index()
df.query("subj in ['BART','NEWYORK'] & obj in ['USA','HOMMER']")

edited May 09 '21 at 12:44

answered May 09 '21 at 10:05

Nk03

14,699
2
8
22

Is there any other way to deal with the following conditions? `KG[KG['subj'].isin(['BART', 'NEWYORK']) & KG['obj'].isin(['USA', 'HOMMER'])]` – Won chul Shin May 09 '21 at 11:33

a faster way than the isin function of Pandas to extract conditional rows

1 Answers1