0

I have datasets I am scanning for a certain pattern using regex. Some of these Tables have millions of rows and doing column by column search is time consuming. So I am using iterrows.

This way the first index, row it finds the matching pattern it flags and ends the loop. But the problem with this is that I can't determine the column name. Ideally I want the name of column where it found the match

Code sample:

for index, row in df.iterrows():
        #regex to identify any 9 digit number starting with 456 goes here

enter image description here

Currently my output prints the index of the row it found the first match in and exits. What's a better way I can write this so that I can capture the column name or column index it was found in? Like for the Data sample above Ideally I want the columns "Acc_Number" printed.

aRad
  • 109
  • 1
  • 1
  • 9
  • 1
    You probably should not be using iteration, and especially `iterrows()` for data of this size. But if you provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) showing sample input data and expected output, as well as the code you've tried, someone may be able to assist. See [How to Ask](https://stackoverflow.com/questions/how-to-ask) and [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391) if you want more tips. – AlexK Apr 20 '21 at 22:34
  • Thank you! I was still working on it and hadn't realized I posted it by accident. – aRad Apr 21 '21 at 00:27
  • 1
    If you are inclined to use `iterrows()`, you can just extract the index corresponding to the value found in the row: `for index, row in df.iterrows(): if row.astype('str').str.contains(regex_pat).any(): print(row[row.astype('str').str.contains(regex_pat)].index[0]) break.` – AlexK Apr 21 '21 at 02:17
  • 1
    But there are many faster solutions than `iterrows()` (pandas `itertuples()` and even `.apply()` will be faster; you should also look into numpy methods [here](https://stackoverflow.com/questions/432112/is-there-a-numpy-function-to-return-the-first-index-of-something-in-an-array) and [here](https://stackoverflow.com/questions/7632963/numpy-find-first-index-of-value-fast)). You should also consider if you really need to search all of these columns, even those with object (string) or datetime type. – AlexK Apr 21 '21 at 02:19
  • Thank you! I'll look into intertuples() and .apply() – aRad Apr 21 '21 at 14:15

0 Answers0