3

I am trying to essentially replicate the Find function (control-f) in Python with Pandas. I want to search and entire sheet (all rows and columns) to see if any of the cells on the sheet contain a word and then print out the row in which the word was found. I'd like to do this across multiple sheets as well.

I've imported the sheet:

pdTestDataframe = pd.read_excel(TestFile, sheet_name="Sheet Name", 
keep_default_na= False, na_values=[""])

And tried to create a list of columns that I could index into the values of all of the cells but it's still excluding many of the cells in the sheet. The attempted code is below.

columnsList = []
for i, data in enumerate(pdTestDataframe.columns):
    columnList.append(pdTestDataframe.columns[i])
for j, data1 in enumerate(pdTestDataframe.index):
    print(pdTestDataframe[columnList[i]][j])

I want to make sure that no matter the formatting of the excel sheet, all cells with data inside can be searched for the word(s). Would love any help I can get!

Hunter Sneed
  • 45
  • 1
  • 1
  • 7

1 Answers1

1

Pandas has a different way of thinking about this. Just calling df[df.text_column.str.contains('whatever')] will show you all the rows in which the text is contained in one specific column. To search the entire dataframe, you can use:

mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]

(Source is here)

Josh Friedlander
  • 10,870
  • 5
  • 35
  • 75
  • Can you expand a little on the functionality of the mask string and how to use it? When I tried to implement it, it didn't work properly. – Hunter Sneed Aug 02 '18 at 16:14
  • A mask is a way of filtering your dataframe. Checking if a series (in this context, a column in your df) contains a str returns an array of `TRUE` and `FALSE` values. Applying this 'mask' to your df returns only the rows where the statement is true. By looping through each column, you get the aggregate of all the rows in which the string is contained in at least one column. Hope this helped! I recommend reading the [Pandas beginner guide](https://pandas.pydata.org/pandas-docs/stable/10min.html#min). – Josh Friedlander Aug 02 '18 at 22:51
  • @JoshFriedlander will this search across all sheets in an excel file though? Or is it assuming that a particular sheet has been parsed to a dataframe. Perhaps a file with multiple sheets is a different question, that was how i found this post though (searching for a solution to that) – baxx May 06 '20 at 10:44