0

I have see examples of how pandas dataframe can be filtered based on a match within a specific column. Can I further expand on the question where instead of searching within a specific column I am trying to find an efficient way to identify rows containing a specific regex matched value across all columns... Nested for loop is just way too inefficient - to the point where its faster to dump datatable into csv file and grepping it.

There must be a more efficient native to pandas way to accomplish this?

Thank you!

Igor M
  • 11
  • 2
    Yes, it is possible. Please expand a bit more with a [mcve] and actual sample data we can copy and paste into a terminal. – cs95 Feb 19 '19 at 22:42

1 Answers1

0

I will take the existing example from this post, Select rows from a DataFrame based on values in a column in pandas:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

Now given the above dataset I am looking for an efficient way to return all rows containing a value from any column matching on a regex.

For example,

a search on '1[2,4]|three' should return

3  bar  three  3   6
6  foo    one  6  12
7  foo  three  7  14
Gary
  • 1