0

I am trying to search a keyword in pandas dataframe. Currently I am using isin() method to search the entire dataframe, it is working correctly but it is taking a lot of time when dealing with big dataset exceeding 1 Gb.

The problem I am addressing is :

Suppose I have a dataset df :

Player_Name      Country       Type_of_sports

Messi            Argentina     Football
Ronaldo          Portugal      Football
Kohli            India         Cricket
Federer          Switzerland   Tennis

Column name - Player_Name, Country,Type_of_sports

So if a user enters a query for example:

query = 'Which country is Messi from ?'

So my keyword in this query will be Messi.

So now I need to search for Messi in my entire dataframe.

So is there any efficient method to search and find such data values in data-frame without using for loop or isin() method??

Note - It is not always the case that the query will always contain the exact column name.

For example - new_query- 'Name of players playing football '.

Now here I need to search for keyword Football in the entire data-frame. Is there any method to search for Football without using for loop or isin() function.

Thank you

shre2306
  • 65
  • 2
  • 8
  • you mean `df[df.Player_Name.isin(query.split())]` ?? how dynamic can the query be – anky Mar 21 '19 at 05:57
  • sorry for not defining the types of queries properly. I have edited my question. Please find the changes and give any suggestions for my problem. thanks @anky_91 – shre2306 Mar 21 '19 at 06:33

1 Answers1

0

To efficiently answer that query, you can use the following:

df.loc[df.Player_Name == 'Messi', 'Country']
'Argentina'

If a given player name does not exist in that column, there will be no matches, and an empty series will be returned.

Nathaniel
  • 3,230
  • 11
  • 18
  • thank you for answering the query. I have made some changes in my post. I want to search for 'Messi' not knowing whether Argentina belongs to country column. So I have to seach for Argentina in my entire data frame to find that it belongs to country column. Any suggestion? – shre2306 Mar 21 '19 at 06:36
  • I don't know of a way to search the entire data frame for a value more efficiently that using ``df.isin()``. If you are working with large data sets, you will experience some slowdowns. Pandas should be able to handle 1 GB, but above 100 GB you may be better off using other tools such as Spark/Hadoop. – Nathaniel Mar 21 '19 at 07:05