0

I don't really know how to formulate this, but I have a bunch of IATA codes, and I want to generate all the possible combinations ex : JFK/LAX, BOS/JFK, ...etc, separated by a character such as "/" or "|".

Ash
  • 59
  • 6
  • 1
    When is a combination "possible"? Without restrictions there are infinitely many combinations. To make this finite, a restriction could be "inside one combination the same code may appear at most once" or "a combination contains at most X codes". – Socowi Sep 25 '21 at 18:07
  • 2 by 2, for example for JFK make : JFK/LAS, JFK/DEN, JFK/BOS, ...etc, for all the other IATA codes, and once we're done with JFK, the same for BOS : BOS/JFK, BOS/DEN, ...etc – Ash Sep 25 '21 at 18:12
  • I don't want it to make combinations with a IATA code as such JFK, KFJ, JKF, KFJ, ...etc. idk if "combinations" is the right word – Ash Sep 25 '21 at 18:14
  • 1
    "Combination" is correct, but very broad. In your case "pair" (without `X/X`) would have been clearer :) – Socowi Sep 25 '21 at 18:29

1 Answers1

1

Here we assume your IATA codes are stored in the file file; one code per line.

crunch has the -q option which generates permutations of lines from a file. However, in this mode crunch ignores most of the other options like <max-len>, which would be important here to print only pairs of codes.

Therefore, it would be easier and faster to …

Use something different than crunch

For instance, try

join -j2 -t/ -o 1.1 2.1 file file | awk -F/ '$1!=$2'

If you really, really, really want, you can …

Translate the input into something crunch can work with

We translate each line from file to a unique single character, supply that list of characters to crunch, and then translate the result back.

crunch supports Unicode characters, so files with more than 255 lines are totally fine. Here we enumerate the lines in file by characters in Unicode's Supplementary Private Use Area-A. Therefore, file may have at most 65'534 lines.
If you need more lines, you could combine multiple Unicode planes, but at some point you might run into ARG_MAX issues. Also, with 65'534 lines you would already generate (a bit less than) 65'534^2 = 4'294'705'156 pairs, occupying more than 34 GB when translated into pairs of IATA codes.

I suspect the back-translation to be a huge slowdown, so above alternative seems to be better in every aspect (efficiency, brevity, maintainability, …).

# This assumes your locale is using any Unicode encoding,
# e.g. UTF-8, UTF-16, … (doesn't matter which one).

file=...
((offset=0xF0000))
charset=$(
  echo -en "$(bc <<< "obase=16;
    max=$offset+$(wc -l < "$file");
    for(i=$offset;i<max;i++) {\"\U\"; i}" |
    tr -d \\n
  )"
)
crunch 2 2 "$charset" -d 1@ --stdout |
iconv -t UTF-32 |
od -j4 -tu4 -An -w12 -v |
awk -v o="$offset" 'NR==FNR{a[o+NR-1]=$0;next} {print a[$1]"/"a[$2]}' "$file" -
Socowi
  • 25,550
  • 3
  • 32
  • 54
  • I..I seriously think your answer is above my level of understanding of crunch-wordlist, it would take me longer to make sens of what you said than to type every possible combination by hand – Ash Sep 25 '21 at 23:41
  • 1
    **tl;dr:** `crunch` cannot generate pairs of words, but it can generate pairs of characters. For the words `JFK LAX BOS` use `crunch 2 2 abc -d 1@` to generate `ab ac ba bc ca cb` and then replace `a`→`JFK` and `b`→`LAX` and `c`→`BOS` to get a pairs of words. – Socowi Sep 26 '21 at 00:13
  • 1
    But as I said: It is easier to drop `crunch` entirely and use `join -j2 -t/ -o 1.1 2.1 file file | awk -F/ '$1!=$2'` instead. Do you need any help with that command too? – Socowi Sep 26 '21 at 00:17
  • YEEEEEEES, thank you, exactly what I was looking for, thanks mate – Ash Sep 26 '21 at 09:31
  • @Ash Glad to hear this solved your problem. Please [accept](https://stackoverflow.com/help/someone-answers) this answer to close the question. – Socowi Sep 26 '21 at 10:01