1

I have two files file1 and and file2. file1 has 63000 lines and file2 has 6000 lines. I need to print the lines which are in both files.

file1

1bl9
1bln_2
1bln_3
1blx
1blx
1bm3
1bm3
1bm9_1
1bm9_2
1bm9_1
1bm9_2

file2

1blx
1blx
1bm4
1bln_2

output

1blx
1blx
1bln_2

I used the following program . But it works for files having less number of lines. It doesn't work for files having large number of lines.

awk 'FNR==NR { a[$0]; next } $0 in a' file2 file1
abar
  • 11
  • 1
  • 2
    "It doesn't work" is not enough information. In what way does it not work? What's a "large number of lines"? – ooga Jul 19 '14 at 15:54
  • @abar Use a more stable Awk that can handle large lines like GNU Awk or go for other languages like Perl, Python, or Ruby. – konsolebox Jul 19 '14 at 16:37
  • @ooga file1 has 63000 lines and file2 has 6000 lines. My code doesn't work with these files. But if I reduce the no:of lines, my code works. – abar Jul 19 '14 at 17:09
  • Again, in *what sense* does it not work? What happens? Do you get output, but the output is incorrect? Do you not get any output? I notice that you put the smaller file first, so that's the one that you're reading into the array. Do the lines look like what you've shown above? 6000 of those very small lines doesn't take much memory. And what system are you on? – ooga Jul 19 '14 at 17:16
  • @ooga I don't get any output. The lines are look like the above example.I am using ubuntu 14.04 LTS. I couldn't get the solution for this. – abar Jul 20 '14 at 02:57
  • @abar Maybe one of your files is not in UNIX format? – konsolebox Jul 20 '14 at 09:51
  • This question should be reopened since the linked-to "answer" is terrible and there seems to be an issue here not covered there. – ooga Jul 20 '14 at 14:05

1 Answers1

2

Simply use fgrep or grep -F and -f

fgrep -f file1 file2

Or use awk:

awk 'NR==FNR{a[$0]++;next}a[$0]' file1 file2

Both outputs

1blx
1blx
1bln_2

Note: You should make sure your files are in UNIX format:

sed -i.bak 's|\r||' file1 file2

Or use dos2unix (only use once per file):

dos2unix file1
dos2unix file2
konsolebox
  • 72,135
  • 12
  • 99
  • 105
  • Thank you very much for your answer. Your code works with the given example. But it doesn't work with my original files. – abar Jul 19 '14 at 17:05
  • The `grep -Ff` is a great idea! I think the `a[$0]` pattern in the `awk` example should be `$0 in a` to avoid needlessly adding null strings (not to mention the keys) to `a`, as I believe happens with a plain `a[$0]`. – ooga Jul 19 '14 at 17:26