202

I'm trying to write a simple script that will list the contents found in two lists. To simplify, let's use ls as an example. Imagine "one" and "two" are directories.

one=`ls one`
two=`ls two`
intersection $one $two

I'm still quite green in Bash, so feel free to correct how I am doing this. I just need some command that will print out all files in "one" and "two". They must exist in both. You might call this the "intersection" between "one" and "two".

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
User1
  • 39,458
  • 69
  • 187
  • 265

5 Answers5

341
comm -12  <(ls 1) <(ls 2)
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • 50
    Can't believe I had no knowledge of `comm` until today. This just made my whole week :) – Darragh Enright Aug 19 '14 at 17:49
  • 31
    `comm` requires the inputs to be sorted. In this case, `ls` automatically sorts its output, but other uses may need to do this: `comm -12 <(some-command | sort) <(some-other-command | sort)` – Alexander Bird Jan 15 '15 at 21:11
  • 16
    DO NOT USE ls' output for anything. ls is a tool for interactively looking at directory metadata. Any attempts at parsing ls' output with code are broken. Globs are much more simple AND correct: ''for file in *.txt''. Read http://mywiki.wooledge.org/ParsingLs – Rany Albeg Wein Jan 25 '16 at 03:49
  • 2
    I just used this in an effort to find usages of a `public` method `error()` provided by a trait, in combination with `git grep`, and it was awesome! I ran `$ comm -12 <(git grep -il "\$this->error(" -- "*.php") <(git grep -il "Dash_Api_Json_Response" -- "*.php")`, and luckily I ended up with the name of the file only that contained the trait. – localheinz Apr 07 '17 at 15:45
  • 3
    This is hilarious. I was trying to do some crazy stuff with awk. – Rolf May 08 '17 at 23:36
  • 1
    For those who are wondering what 12 means: -1 suppress column 1 (lines unique to FILE1), -2 suppress column 2 (lines unique to FILE2), -3 suppress column 3 (lines that appear in both files) – Sriram Kannan Aug 11 '22 at 09:10
  • @DarraghEnright according to the OpenBSD [man page](http://man.openbsd.org/comm), "a comm command appeared in Version 4 AT&T UNIX". – Cristian Ciupitu Mar 27 '23 at 12:26
88
Solution with comm

comm is great, but indeed it needs to work with sorted lists. And fortunately here we use ls which from the GNU Coreutils documentation:

By default, the output is sorted alphabetically, according to the locale settings in effect.

comm -12  <(ls one) <(ls two)
Alternative with sort

Intersection of two lists:

sort <(ls one) <(ls two) | uniq -d

Symmetric difference of two lists:

sort <(ls one) <(ls two) | uniq -u
Bonus

Play with it ;)

cd $(mktemp -d) && mkdir {one,two} && touch {one,two}/file_{1,2}{0..9} && touch two/file_3{0..9}
Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
Jean-Christophe Meillaud
  • 1,961
  • 1
  • 21
  • 27
33

Use the comm command:

ls one | sort > /tmp/one_list
ls two | sort > /tmp/two_list
comm -12 /tmp/one_list /tmp/two_list

"sort" is not really needed, but I always include it before using "comm" just in case.

Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
DVK
  • 126,886
  • 32
  • 213
  • 327
  • 7
    It's good to include it since it does need to be sorted, and he only used ls as an example. – Vala Feb 28 '12 at 11:47
3

A less efficient (than comm) alternative:

cat <(ls 1 | sort -u) <(ls 2 | sort -u) | uniq -d
Benubird
  • 18,551
  • 27
  • 90
  • 141
  • 2
    If you are using Debian's /bin/dash or some other non-Bash shell in your scripts, you can chain commands' output using parentheses: `(ls 1; ls 2) | sort -u | uniq -d`. – nitrogen Oct 08 '14 at 20:19
  • 2
    @MikaëlMayer You should flag the name of the person you are replying to, otherwise it is assumed you mean me. – Benubird Feb 23 '15 at 08:34
  • 1
    @nitrogen MikaëlMayer is correct - chainging `sort -u | uniq -d` does nothing, because the sort has removed the duplicates before uniq starts to look for them. I think you have not understood what my command is doing. – Benubird Feb 23 '15 at 08:36
  • 1
    @Benubird I was not able to get your command `cat <(ls 1 | sort -u) <(ls 2 | sort -u) | uniq -d` to output anything either. My command should read `(ls 1; ls 2) | sort | uniq -d`, without the `-u`, to show list intersection. @MikaëlMayer was right that my original command was broken. – nitrogen Feb 24 '15 at 09:21
  • @nitrogen The reason why I'm using cat, is because I want this to be a generalizable solution, so that you can replace `ls` with something else, e.g. `find`. Your solution does not allow this, because if one of the commands returns two lines the same, it picks it up as a duplicate. Mine works even if the user wants to do `ls 1/*` and compare all files across subdirectories. Otherwise, yes, it works as well. It's possible mine is bash-specific. – Benubird Feb 24 '15 at 09:50
  • If anyone is interested you can try my version of "comm" which I called "common". It does not need sorting and supports "-123" switches just like "comm". https://github.com/toni-rmc/common – toni rmc May 29 '17 at 15:31
2

Join is another good option depending on the input and desired output

join -j1 -a1 <(ls 1) <(ls 2)
frogstarr78
  • 860
  • 1
  • 7
  • 11
  • 3
    An explanation would be in order. E.g., why is it a good option? How is it different from `comm`? Why and when should it be used over `comm`? What is it supposed to do? Why options `-j1` and `-a1`? - why are they needed and what is their significance/meaning? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/22977016/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Nov 02 '21 at 01:40
  • 1
    I'm not teaching a class. The questions you asked can be found in the manual for the command. – frogstarr78 Aug 13 '22 at 22:56