0

I have 100 files containing 3 columns and different no. of rows. All three columns contain repeating elements. I want to find the common elements among all 100 files. The files look like:

1.txt

5901 5902   8229
5901 5902  17481
5901 5902  27561
5929 5930  12875

2.txt

5901 5902  8229
5929 5930  12875

and so on. Code which I am trying to use is as for ((i=0;i<=100;i++)) do comm -12 file-"$i".txt file-"$((i+1))".txt > common-element-"$i".txt done

I have used comm command but that was only for 2 files. I have 100 such files.

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • what are your *the common elements* ? – RomanPerekhrest Apr 21 '17 at 12:24
  • The files which I have shown above have common elements like 5901 5902 8229 – Abhinav Srivastava Apr 21 '17 at 12:27
  • do add the `comm` command you tried for 2 files... did it solve for two files? if so, you could very well use a loop, like it was shown in your previous question: https://stackoverflow.com/questions/43472246/finding-common-value-across-multiple-files-containing-single-column-values – Sundeep Apr 21 '17 at 12:28
  • You want to output any number, regardless of row or column, that appears in all 100 files? – jas Apr 21 '17 at 12:28
  • Yes regardless of rows as column numbers are same. I want output for those numbers which are present in all 100 files – Abhinav Srivastava Apr 21 '17 at 12:34
  • Using the loop can I am using following loop: for ((i=0;i<=100;i++)) do comm -12 -nocheck-order file-"$i".txt file-"$((i+1))".txt > common-element.txt done Will it work for comparing elements among all 100 files ? – Abhinav Srivastava Apr 21 '17 at 12:36
  • please click https://stackoverflow.com/posts/43542609/edit to add the code you tried to question and use https://stackoverflow.com/editing-help if you face formatting issues – Sundeep Apr 21 '17 at 12:46

1 Answers1

0

if the values are unique within the file, you can count the occurrences of each row and select the ones that are equal to the number of files, which can be done with uniq -c after sorting all the files, but sorting not required with the alternative below.

awk to the rescue!

awk '{$1=$1} ++a[$0]==(ARGC-1)' file{1..100}.txt

5901 5902  8229
5929 5930  12875

$1=$1 statement is to normalize white space since it's not consistent in your example.

karakfa
  • 66,216
  • 7
  • 41
  • 56