0

Please note, i understand how to output lines in one file that are not in another (here), my question is a little different.

In one file i have lines akin to

Андреев
Барбашев
Иванов
...

in a different file there are lines:

Барбашёв
Семёнов
...

Now. I need the lines from the second file, but only if you cannot find a line in the first where you substitute ё for е. For example Барбашёв should not display, because Барбашев is in the first.

If i do something like

comm -13 first.txt <(cat second.txt | sed 's/ё/е/g')

i get the correct lines, however, they have already been tranformed by that time, and it's unacceptable for what i'm trying to do.

In other words the output is:

Барбашев
...

While it should be

Барбашёв
...
Cœur
  • 37,241
  • 25
  • 195
  • 267
v010dya
  • 5,296
  • 7
  • 28
  • 48

1 Answers1

1

You meant:

"Now. I need the lines from the second file, but only if you cannot find a line in the first when you substitute ё for е in the second file."

instead of

"Now. I need the lines from the second file, but only if you cannot find a line in the first where you substitute ё for е."

Right?

Without using a cyrilic charset, this solution works:

file test.awk

#!/usr/bin/gawk -f

{
    if(NR==FNR)
        arr[$1]++;
    else {

        tmp=$1;
        gsub("t","e",tmp)

        if(!(tmp in arr))
            printf("%s\n", $1);
    }
}

Use:

$ ./test.awk file1 file2

If you substitute "t" -> "ё" this should also work imo. Maybe you can try.

FloHe
  • 313
  • 1
  • 3
  • 10