2

Print only the lines which are existing in all the four given input files. from the below shown input files only /dev/dev_sg2 and /dev/dev_sg3 are existing on all the input files

$ cat file1
/dev/dev_sg1
/dev/dev_sg2
/dev/dev_sg3
/dev/dev_sg4

$ cat file2
/dev/dev_sg8
/dev/dev_sg2
/dev/dev_sg3
/dev/dev_sg6

$ cat file3
/dev/dev_sg5
/dev/dev_sg2
/dev/dev_sg3
/dev/dev_sg6

$ cat file4
/dev/dev_sg2
/dev/dev_sg3
/dev/dev_sg1
/dev/dev_sg4

Tried tools:-

cat file* | sort |uniq -c

      1 /dev/dev_sg1
      4 /dev/dev_sg2
      4 /dev/dev_sg3
      1 /dev/dev_sg4
      1 /dev/dev_sg5
      2 /dev/dev_sg6
      1 /dev/dev_sg8
asokan
  • 199
  • 1
  • 11
  • 1
    Possible duplicate of [Finding common value across multiple files containing single column values](https://stackoverflow.com/questions/43472246/finding-common-value-across-multiple-files-containing-single-column-values) – Sundeep Jan 02 '18 at 07:45

4 Answers4

1

With comm pipeline:

comm -12 <(sort file1) <(sort file2) | comm -12 - <(sort file3) | comm -12 - <(sort file4)
  • -12 - suppress lines unique to both input file, print only common lines

The output:

/dev/dev_sg2
/dev/dev_sg3
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
0

Following awk code may help you in same.

awk 'FNR==NR{a[$0];next} ($0 in a){++c[$0]} END{for(i in c){if(c[i]==3){print i,c[i]+1}}}' Input_file1 Input_file2 Input_file3 Input_file4

Output will be as follows.

/dev/dev_sg2 4
/dev/dev_sg3 4

EDIT: In case you don't want to have the count of the lines and simply want to print the lines which come in all 4 Input_files then following will do the trick:

awk 'FNR==NR{a[$0];next} ($0 in a){++c[$0]} END{for(i in c){if(c[i]==3){print i}}}'  Input_file1 Input_file2 Input_file3 Input_file4

EDIT2: Adding explanation for code too now.

awk '
FNR==NR{ ##FNR==NR condition will be TRUE when very first Input_file here Input_file1 is being read.
 a[$0];  ##creating an array named a whose index is current line $0.
 next    ##next is awk out of the box keyword which will avoid the cursor to go forward and will skip all next statements.
}
($0 in a){ ##These statements will be executed when awk complete reading the first Input_file named Input_file1 name here. Checking here is $0 is in array a.
 ++c[$0]   ##If above condition is TRUE then make an increment in array named c value whose index is current line.
}
END{       ##Starting END block of awk code here.
for(i in c){##Initiating a for loop here by which we will iterate in array c.
 if(c[i]==3){ ##checking condition here if array c value is equal to 3, which means it appeared in all 4 Input_file(s).
   print i    ##if, yes then printing the value of i which is actually having the line which is appearing in all 4 Input_file(s).
}
}}
' Input_file1 Input_file2 Input_file3 Input_file4 ##Mentioning all the 4 Input_file(s) here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

If you know beforehand that there won't be more than 4 input files, you could simply add grep at end of your existing solution, like this :

cat file* | sort |uniq -c | egrep '^4'

This will show only lines that have max (4) number of counts at start of line.

If you need this to work for arbitrary number of files, a better solution is needed.

Gnudiff
  • 4,297
  • 1
  • 24
  • 25
0

if the order doesn't need to be maintained

$ j() { join <(sort $1) <(sort $2); }; j <(j file1 file2) <(j file3 file4)

/dev/dev_sg2
/dev/dev_sg3
karakfa
  • 66,216
  • 7
  • 41
  • 56