0

I have 02 text files with the bellow format:

  • File 1:

    2017-08-16 00:00:00,115 - [INFO]  TRANSACTIONS: 123456788 id: 123456
    2017-08-16 00:00:00,115 - [INFO]  TRANSACTIONS: 123456789 id: 123457
    
  • File 2:

    123456 123457 123458 123459

The goal: I would like to get the records from file1 without the id in file2

The commands line and result that i tried:

  • 1st command line: grep -vf file2 file1
  • 2nd command line: comm -23 <(sort file1) <(sort file2)

The both of command worked but there are 3 millions records in file1 and 1 millions records in file2. The 1st command can be complete if there are not much records but it can not complete with 3 millions. The 2nd command is faster than 1st and it can be completed when I executed manually in the ssh console but it did not work with the bash script. The error has showed with "syntax error at "("

Any idea to solve this and complete the goal ?

  • See: [Fast way of finding lines in one file that are not in another?](https://stackoverflow.com/q/18204904/3776858) – Cyrus Aug 21 '17 at 04:44
  • 1
    Possible duplicate of [Fast way of finding lines in one file that are not in another?](https://stackoverflow.com/questions/18204904/fast-way-of-finding-lines-in-one-file-that-are-not-in-another) – Cyrus Aug 21 '17 at 05:05

2 Answers2

0
awk 'NR==FNR{a[$1];next} !($NF in a)' file2 file1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    Just to let you know that "This answer was flagged as low-quality because of its length and content." I've clicked "Looks OK" but who knows... – gboffi Aug 21 '17 at 10:05
  • Yeah I expected it would be as the tool that does the flagging flags tirival answers with no explanation but doesn't flag complicated answers with no explanation and I'm just not interested in adding a bunch of explanatory text around trivial answers. Thanks for the heads up and marking it OK. – Ed Morton Aug 21 '17 at 13:48
-1

I found the way to make it work in the script with the 2nd command:

sort file1 > file1.txt
sort file2 > file2.txt
comm -23 file1.txt file2.txt > result.txt