2

I rewrite my previous question because it was unclear.

I have test1.txt formatted in this way (this example has 3 lines)

Link;alfa (zz);some text;
Link;alfa (zz);other text;other text2;
Link;jack;

In another filetext test2.txt I have text formatted in this way without delimiters ; but only as simple string (this example has 2 lines)

tommy emmanuel
alfa (zz)

In test2.txt I have never Link word and I can I have ( ) characters but I have never ; delimiter character

I want to get result.txt written in this way

Link;jack;

Logic behind : In test2.txt I have alfa (zz). This string / field is the same in test1.txt - I have the same string alfa (zz) on first and second line between; delimiter. Condition: if this field match happens then that lines should be deleted and for this reason I written that I expect only 3rd line

Link;jack;

I test this code

sed 's/.*Link;//;s/;.*//' test2.txt | grep -Fvf- test1.txt

and this

awk -F \; '
FNR == NR {cull[$0]=""}
FNR != NR {
    for (str in cull) {
        if ($2 == str) {
            next
        }
    }
    print
}' test2.txt test1.txt > culled.txt

Problem is that it rewrite me same lines and don't delete lines with same fields

Update question:

According to anubhava answer and this example the presence of this kind of strings don't remove lines

If I have inside test2.txt

Dark Tranquillity - A Moonclad Reflection [ep] (1992) Melodic Death Metal
Dark Tranquillity - A Closer End [best of_compilation] (2008) Melodic Death Metal 

then I can't match and remove lines in text1.txt if I have these lines

Link;Dark Tranquillity - A Moonclad Reflection [ep] (1992) Melodic Death Metal;Dark Tranquillity - A moonclad reflection [7'' Ep 1992_Slaughter Rec.].rar;https://disk.yandex.com/public?hash=JA7Gu2CysxSf2HhAKaBxmU%2By27B6dPd6uRwPFu%2B9x0s%3D;https://metalarea.org/forum/index.php?showtopic=5037

Link;Dark Tranquillity - A Closer End [best of_compilation] (2008) Melodic Death Metal;Dark Tranquillity - A Closer End [2008].rar;https://disk.yandex.com/public?hash=RCZbOrqci8lX%2Fa%2BPzhB6vchlr5rXyc%2B2NHiJNCu%2BQYM%3D;https://metalarea.org/forum/index.php?showtopic=48557
Jack Rock
  • 65
  • 5
  • I can't reproduce your problem - with the sample snippets the awk script does the right thing. – tink Aug 27 '22 at 18:33
  • 2
    After latest edits and the data samples provided by OP it is clear that none of the original commands will work to solve the problem. Easily reproducible. – anubhava Aug 28 '22 at 05:04
  • 2
    Please read [why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it) to learn what's almost certainly causing your problem and how to fix it. – Ed Morton Aug 28 '22 at 11:47

2 Answers2

2

You may use this awk solution:

awk -F';' 'FNR == NR {
   gsub(/^[[:space:]]+|[[:space:]]+$/, "")
   cull[$0]
   next
}
!($2 in cull)' test2.txt test1.txt > culled.txt

cat culled.txt

Link;Dark Tranquillity - Enter Suicidal Angels [ep] (1996) Melodic Death Metal;Dark Tranquillity 1996 - Enter Suicidal Angels (EP).rar;https://disk.yandex.com/public?hash=fBvwBTBJ8%2Fx1mWXvl7usrAMe06esHZFDmHJWF8E2T6LK7Wvfu9Q5Qja9cb5JAU%2Fzq%2FJ6bpmRyOJonT3VoXnDag%3D%3D;https://metalarea.org/forum/index.php?showtopic=5041
Link;Sea Of Tranquillity - Darkened [demo] (1993) Death_Thrash Metal;eaofT3Dd.7z;https://disk.yandex.com/public?hash=DzCNqfEv2pydYzB0YWVvetV2Jx8QDCwktop3y8PIC%2BD5W%2Fnt8ikX81%2F7cf49g8dNq%2FJ6bpmRyOJonT3VoXnDag%3D%3D;https://metalarea.org/forum/index.php?showtopic=153504
Link;Dark Tranquillity - The Absolute [single] (2017) Melodic Death Metal (D);Dark Tranquillity - The Absolute (2017) Single MCD (+SATANIST666+).rar;https://cloud.mail.ru/public/ckFc/A5sQ6pqhb;
Link;Dark Tranquillity - Trail Of Life Decayed [ep] (1992) Melodic Death Metal;1991 - Trail Of Life Decayed.7z;https://www.mediafire.com/file/5pi74bqvujea9rg/1991_-_Trail_Of_Life_Decayed.7z/file;https://metalarea.org/forum/index.php?showtopic=5115
Link;Dark Tranquillity - Phantom Days [single] (2020) Melodic Death Metal (D);Dark Tranquillity - Phantom Days (2020) by Andrew.rar;https://www.mediafire.com/file/ho25i02j3ybgmty/Dark_Tranquillity_-_Phantom_Days_%25282020%2529_by_Andrew.rar/file;https://metalarea.org/forum/index.php?showtopic=341548
Link;Dark Tranquillity - Of Chaos And Eternal Night [ep] (1995) Melodic Death Metal;Dark Tranquillity - Of Chaos And Eternal Night (EP) [1995].rar;https://disk.yandex.com/public?hash=Ax%2B2Gfqzr9%2FdS87cgRcUhGBCoQzKZfz5ZDUa2U%2Fbsn4%3D;https://metalarea.org/forum/index.php?showtopic=5040
Link;Dark Tranquillity - Of Chaos And Eternal Night [ep] (1995) Melodic Death Metal;Dark Tranquillity 1995 - Of Chaos And Eternal Night (EP).rar;https://disk.yandex.com/public?hash=le8r7ZI%2F%2BTw2CjsDbNFriDZdCGpSy1hj%2BoQdGHrrBFdcJM8eIp%2F3J17qG5MjC1Fgq%2FJ6bpmRyOJonT3VoXnDag%3D%3D;https://metalarea.org/forum/index.php?showtopic=5040

There is no need to use for loop. Just create an associative array while reading content from 2nd file and then while reading 1st file print only line that have 2nd column not in array seen.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • I test your script but it failed in this condition, please look [this pastebin](https://pastebin.com/raw/JRQD9m5L) - for example if I have in test1.txt only 2 lines and contains is `Dark Tranquillity - A Moonclad Reflection [ep] (1992) Melodic Death Metal` from test2.txt then your script removes that line but not if there are more of some number of lines, I don't understand why – Jack Rock Aug 27 '22 at 19:43
  • Well ... that will be because e.g. `Dark Tranquillity - Enter Suicidal Angels [ep] (1996), Melodic Death Metal` and `Dark Tranquillity - Enter Suicidal Angels [ep] (1996) Melodic Death Metal` aren't identical ... notice the `,` in the former? – tink Aug 27 '22 at 19:54
  • mm..sorry, I didn't see `,` character but if you test you can still see that it doesn't remove lines with `Dark Tranquillity - A Moonclad Reflection [ep] (1992) Melodic Death Metal` and `Dark Tranquillity - A Closer End [best of_compilation] (2008) Melodic Death Metal` inside *test1.txt* – Jack Rock Aug 28 '22 at 00:12
  • 1
    @anubhava ok, I update question with these examples – Jack Rock Aug 28 '22 at 03:10
  • @JackRock: It seems you have trailing spaces in your `test2` file. I have added `gsub` call in my code to address this. – anubhava Aug 28 '22 at 04:56
  • @anubhava I test again but it fails. Please take a look of [my short video](https://mega.nz/file/lwZ0xJyL#eK2jl1PjZqi69VNght8l14ZEG3nlRMnm2P4wz2b41JQ) where I show you how your solution fails. Mm.. I don't sure that problem is only trailing spaces – Jack Rock Aug 28 '22 at 11:05
  • @anubhava sorry, **four** lines are not removed in culled.txt. As source for test1 and test2 I used [pastebin](https://pastebin.com/raw/JRQD9m5L) example. I also test in linux ubuntu terminal but return me same problem, is not a cygwin error – Jack Rock Aug 28 '22 at 11:26
  • 2
    The OP almost certainly has DOS line endings (or some other white space that's not included in `[[:blank:]]`) in test2.txt and CR isn't included in `[[:blank:]]`. This is one of the reasons I always use `[[:space:]]` instead of `[[:blank:]]` unless I specifically NEED to exclude some of the white space characters like CR. – Ed Morton Aug 28 '22 at 11:42
  • 2
    @EdMorton your `[[:blank:]]` to `[[:space:]]` replacement solved the problem, thanks. – Jack Rock Aug 28 '22 at 13:03
  • 2
    @JackRock good, I updated this answer with that change. – Ed Morton Aug 28 '22 at 13:04
  • 1
    @EdMorton: Thank you so much for your suggestion of DOS line break and your edit of my answer to make it a working solution. Jack: Thanks for edits in question and good sample data. – anubhava Aug 28 '22 at 15:05
1

This might work for you (GNU sed):

sed -E '1{x;s/^/cat file2/e;x};G;/^Link;([^;]*);.*\n\1(\n|$)/!P;d' file1

Gather up file2 in the hold space.

Append the hold space to each line in file1 and if there is match between the second field of file1 and any field in file2, delete the line.

Otherwise print the first line in the pattern space i.e. the current line in file1.

potong
  • 55,640
  • 6
  • 51
  • 83
  • mm.. I don't understand well your solution, it seems doesn't work because it returns same lines.., I replaced `file2.t/e;x}` and `file1` with `test2.txt.t/e;x}` and `test2.txt` – Jack Rock Aug 28 '22 at 13:09