0

I have followed this question perl compare two file and print the matching lines and found lines which match or dont match between two files using hash.

But I find that hash rearranges the lines and I want the lines in order. I can write multiple for loops to get results in order but this is not as efficient as hash. Has anyone faced this issue before and could please help with their solution

Community
  • 1
  • 1
Raghav
  • 766
  • 16
  • 24

3 Answers3

2

Maybe don't understand fully the question but

fgrep -xf file2 file1

is not enough?

or

fgrep -xf file1 file2

yes, it is not perl but, short simple and fast...

clt60
  • 62,119
  • 17
  • 107
  • 194
  • 1
    That is pretty short and sweet. Agree this should solve the whole problem without the intermediate perl. But there is a risk of partial matches. If you add the `-x` flag you only match whole lines, which is what the OP wanted, I think. It would be interesting to have a speed comparison vs his two-step approach. – Floris Jul 10 '13 at 22:15
1

This can be done efficiently in two steps. Let's assume you have been able to find the "lines that match" but they are in the wrong order; then a simple grep can re-organize them. Assuming you have a script matchThem that takes two inputs (file1 and file2) and outputs them to tempFile, then the over all script will be:

matchThem file1 file2 > tempFile
grep -Fx -f tempFile file1

The -Fx flag means:

-F : find exact match only (much faster than wildcards)
-x : only match whole lines
Floris
  • 45,857
  • 6
  • 70
  • 122
  • I could not get it working with "grep -Fx tempFile file1" but "fgrep -f tempFile file1" does the work .. Thanks @Floris – Raghav Jul 10 '13 at 22:17
  • I have a short files to compare, so I dont much of a difference and only grep gives the output in order (which I need) .. But I found online that perl hash tables scale better for large log files ..http://stackoverflow.com/questions/11490036/fast-alternative-to-grep-f – Raghav Jul 11 '13 at 04:50
  • Interesting link - if you ever do run this on large files would you update us on the time difference? – Floris Jul 11 '13 at 11:53
1

If you want an hash which keeps the insertion order, then try out the CPAN module Tie::IxHash.

Slaven Rezic
  • 4,571
  • 14
  • 12