1

I have a large text file containing a list of emails called "main", and I have sent mails to some of them. I have a list of 'sent' emails. Now, I want to remove the 'sent' emails from the list "main".

In other words, I want to remove both the matching raw from the text file while removing duplicates. Example:

I have:

email@email.com
test@test.com
email@email.com

I want:

test@test.com

Is there any easier way to achieve this? Please suggest a tool or method to do this, but please consider the text file is larger than 10MB.

leopard121
  • 11
  • 1
  • Does Notepadd++ support regex/scripting? –  Sep 20 '14 at 22:56
  • You can probably use power shells compare-object cmdlet http://technet.microsoft.com/en-us/library/ee156812.aspx – T I Sep 20 '14 at 23:12
  • I am not completely sure about your requirements. But if your main and the send list are in the same file, maybe [my answer here](http://stackoverflow.com/a/16293580/626273) can help you. – stema Sep 22 '14 at 08:37
  • @leopard121 What do you mean by `remove both the matching raw from the text file`? – Cullub Sep 22 '14 at 13:22
  • @stema thanks for the link. The code works, but does not remove all the matching rows. I mean, if there're 10 duplicate rows, it remove the nine, but I need to remove all. – leopard121 Sep 24 '14 at 11:41
  • @cullub I want to remove all matching rows. Please see the example. – leopard121 Sep 24 '14 at 11:42

2 Answers2

0

In terminal:

cat test| sort | uniq -c | awk -F" " '{if($1==1) print $2}'
learnerer
  • 396
  • 1
  • 2
  • 17
0

I use cygwin a lot for such tasks, as the unix command line is incredibly powerful.

Here's how to achieve what you want:

cat main.txt | sort -u | grep -Fvxf sent.txt

sort -u will remove duplicates (by sorting the main.txt file first), and grep will take care of removing the unwanted addresses.

Here's what the grep options mean:

  • -F plain text search
  • -v invert results
  • -x will force the whole line to match the pattern
  • -f read patterns from the specified file

Oh, and if your files are in the Windows format (CR LF newlines) you'll rather have to do this:

cat main.txt | dos2unix | sort -u | grep -Fvxf <(cat sent.txt | dos2unix)

Just like with the Windows command line, you can simply add:

> output.txt

at the end of the command line to redirect the output to a text file.

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158