1

I've a file which stores a directed graph. Each line is represented as

node1 TAB node2 TAB weight

I want to find the set of nodes. Is there a better way of getting union? My current solution involves creating temporary files:

cut -f1 input_graph | sort | uniq > nodes1
cut -f2 input_graph | sort | uniq > nodes2
cat nodes1 nodes2 | sort | uniq > nodes
damned
  • 935
  • 2
  • 19
  • 35

1 Answers1

3
{ cut -f1 input_graph; cut -f2 input_graph; } | sort | uniq

No need to sort twice.

The { cmd1; cmd2; } syntax is equivalent to (cmd1; cmd2) but may avoid a subshell.

In another language (e.g. Perl), you could slurp the first column in a hash and then process the second column sequentially.

With Bash only, you can avoid temporary files by using the syntax cat <(cmd1) <(cmd2). Bash takes care of creating temporary file descriptors and setting up pipelines.

In a script (where you may want to avoid requiring bash), if you end up needing temporary files, use mktemp

dan3
  • 2,528
  • 22
  • 20
  • can you elaborate on `mktemp`? – damned Oct 23 '13 at 10:25
  • That was a marginal point, an FYI for your future scripting habits (which is why I mentioned it last). E.g. in your ORIGINAL code, you could use `mktemp` to generate temporary file names (instead of hard-coding filenames "nodes1" and "nodes2"): `NODES1=$(mktemp); cut -f1 input_graph | sort | uniq > "$NODES1"`. But of course there is no actual need for temporary files of any sort, with hard-coded names or not :) – dan3 Oct 23 '13 at 10:30