I don't really know how to formulate this, but I have a bunch of IATA codes, and I want to generate all the possible combinations ex : JFK/LAX, BOS/JFK, ...etc, separated by a character such as "/" or "|".
-
1When is a combination "possible"? Without restrictions there are infinitely many combinations. To make this finite, a restriction could be "inside one combination the same code may appear at most once" or "a combination contains at most X codes". – Socowi Sep 25 '21 at 18:07
-
2 by 2, for example for JFK make : JFK/LAS, JFK/DEN, JFK/BOS, ...etc, for all the other IATA codes, and once we're done with JFK, the same for BOS : BOS/JFK, BOS/DEN, ...etc – Ash Sep 25 '21 at 18:12
-
I don't want it to make combinations with a IATA code as such JFK, KFJ, JKF, KFJ, ...etc. idk if "combinations" is the right word – Ash Sep 25 '21 at 18:14
-
1"Combination" is correct, but very broad. In your case "pair" (without `X/X`) would have been clearer :) – Socowi Sep 25 '21 at 18:29
1 Answers
Here we assume your IATA codes are stored in the file file
; one code per line.
crunch
has the -q
option which generates permutations of lines from a file. However, in this mode crunch
ignores most of the other options like <max-len>
, which would be important here to print only pairs of codes.
Therefore, it would be easier and faster to …
Use something different than crunch
For instance, try
join -j2 -t/ -o 1.1 2.1 file file | awk -F/ '$1!=$2'
If you really, really, really want, you can …
Translate the input into something crunch
can work with
We translate each line from file
to a unique single character, supply that list of characters to crunch
, and then translate the result back.
crunch
supports Unicode characters, so files with more than 255 lines are totally fine. Here we enumerate the lines in file
by characters in Unicode's Supplementary Private Use Area-A. Therefore, file
may have at most 65'534 lines.
If you need more lines, you could combine multiple Unicode planes, but at some point you might run into ARG_MAX
issues. Also, with 65'534 lines you would already generate (a bit less than) 65'534^2 = 4'294'705'156 pairs, occupying more than 34 GB when translated into pairs of IATA codes.
I suspect the back-translation to be a huge slowdown, so above alternative seems to be better in every aspect (efficiency, brevity, maintainability, …).
# This assumes your locale is using any Unicode encoding,
# e.g. UTF-8, UTF-16, … (doesn't matter which one).
file=...
((offset=0xF0000))
charset=$(
echo -en "$(bc <<< "obase=16;
max=$offset+$(wc -l < "$file");
for(i=$offset;i<max;i++) {\"\U\"; i}" |
tr -d \\n
)"
)
crunch 2 2 "$charset" -d 1@ --stdout |
iconv -t UTF-32 |
od -j4 -tu4 -An -w12 -v |
awk -v o="$offset" 'NR==FNR{a[o+NR-1]=$0;next} {print a[$1]"/"a[$2]}' "$file" -

- 25,550
- 3
- 32
- 54
-
I..I seriously think your answer is above my level of understanding of crunch-wordlist, it would take me longer to make sens of what you said than to type every possible combination by hand – Ash Sep 25 '21 at 23:41
-
1**tl;dr:** `crunch` cannot generate pairs of words, but it can generate pairs of characters. For the words `JFK LAX BOS` use `crunch 2 2 abc -d 1@` to generate `ab ac ba bc ca cb` and then replace `a`→`JFK` and `b`→`LAX` and `c`→`BOS` to get a pairs of words. – Socowi Sep 26 '21 at 00:13
-
1But as I said: It is easier to drop `crunch` entirely and use `join -j2 -t/ -o 1.1 2.1 file file | awk -F/ '$1!=$2'` instead. Do you need any help with that command too? – Socowi Sep 26 '21 at 00:17
-
-
@Ash Glad to hear this solved your problem. Please [accept](https://stackoverflow.com/help/someone-answers) this answer to close the question. – Socowi Sep 26 '21 at 10:01