0

There has been a data breach and I need to find all file paths across a file server with email addresses.

I was trying

grep -lr --include='*.{csv,xls,xlsx,txt}' "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" * >output.txt

But, this is returning nothing.

I would be grateful for an suggestions. thanks!

1 Answers1

0

Your grep command is almost correct, there is only small glitches that make it not working.

First, for matching your email regex, you should use grep's extended regex option -E.

Next, as explained in this answer to another question, your --include pattern will not work in zsh. You need to put your ending quote before the braces, as follow: --include='*.'{csv,xls,xlsx,txt}

Finally, if you want to get all files on the server, you should perform this command on root directory / instead of on all files/directories present in the directory you are when you execute the command as you do with *

So your grep command should be:

grep -Elr --include='*.'{csv,xls,xlsx,txt} "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" /

Some points to take into account:

  • you will not detect email in excel files .xls and .xlsx as they are binary files so grep will not be able to parse them.
  • email pattern matching is rather hard, there are a lot of special cases in email parsing. The email pattern you're currently using will catch almost all emails, but not all of them.
Vincent Doba
  • 4,343
  • 3
  • 22
  • 42