I have two files that contain n lines with a string in each line. I want to print out the difference in characters between those lists. You could imagine the operation as a sort of "Subtraction" of letters. This is how it should look like:
List1 List2 Result
AaBbCcDd AaCcDd Bb
AaBbCcE AaBbCc E
AaBbCcF AaCcF Bb
Which means that the second list is not sorted alphabetically, but all the substrings to remove are sorted within each string (Aa
comes before Bb
comes before Cc
). Note that the elements to remove can be either 1 or 2 characters long (Aa
or F
), always starting with uppercase letters followed (sometimes) by a lowercased letter. The strings are completely composed of permutations of a few "elements" like Aa
, Bb
, Cc
, Dd
, E
, F
, Gg
, ... and so on.
This question has been answered in very similar form here: Bash script Find difference between two strings, but only for two strings entered manually, whereas I need to do the operation many hundreds of times. I am struggling with implementing files as a source to this command while also separating the characters correctly. Here is my adaptation:
split_chars() { sed $'s/./&\\\n/g' <<< "$1"; }
comm -23 <(split_chars AaBbCcDd) <(split_chars AaCcDd)
which gives as output
B
b
so still not quite what I want even in this single case. I guess that the split_chars
command is the key here but I was not able to apply it to my files in any way. Putting the file names inside the brackets does not work obviously.
For reference, a simple
commm -23 List1 List2
just leads to
AaBbCcDd
AaBbCcEe
AaBbCcF
comm: file 2 is not in sorted order