Mass cross reference in Notepad++

Question

I have a txt file (A.txt) with 20,000 domain names, one per line. I have another txt file (B.txt) that contains thousands of Whois records compiled together. I want to see which domains in A.txt are not referenced in B.txt. It's trivial to do this one-by-one, but how can I do it in mass? Thanks

Is using [spreadsheets/Excel](http://stackoverflow.com/questions/4160243/join-two-spreadsheets-on-a-common-column-in-excel-or-openoffice) out of the question? — Primoz, Mar 22 '13 at 10:13

score 0 · Answer 1 · answered Mar 23 '13 at 09:35

You could edit file A.txt to have lines of the style example.com A other stuff and file B.txt to have lines of the form example.com B other stuff. Then sort the two files together. Next run a Notepad++ regular expression replace, searching for ^([^ ]+) A .*\r\n(\1 B ) and replacing with \2. The effect is that any A.txt line that matches a B.txt is removed, leaving the B.txt line. In case there are multiple A.txt lines that match one B.txt then run the replace two or more times until no lines are replaced. Finally, delete the B.txt lines (use a regular expression to find and mark lines looking for ^([^ ]+) B then remove bookmarked lines) leaving the unmatched A.txt lines.

Not knowing the format of the source files A.txt and B.txt I cannot suggest a regular expression to put the URL followed by an A or B at the start of the lines.

Mass cross reference in Notepad++

1 Answers1