This code:
perl -pe 's/^(\D\w+ \w+)( word )/\1;word/gi'
doesn't work when the input has words with accented or particular characters like: á, Ș.
Precisations:
I have this code to make a count of the only artist files.
find /PATH/ -type f -exec basename "{}" + 2>/dev/null |
perl -pe 's/ - .*//g' | LC_ALL=C sort -f | uniq -c -i|
gsed -e 's/$/;/'|
awk '{numero=$1;$1=""}{print $0,numero}'|
perl -pe 's/^(\D\w+ \w+)( & )/\1;&/g' |
perl -pe 's/^(\D\w+ \w+ \w+)( & >)/\1;&/g' |
perl -pe 's/^(\D\w+ \w+ \w+ \w+)( & )/\1;&/g' |
perl -pe >'s/^(\D\w+ \w+ \w+ \w+ \w+)( & )/\1;&/g' |
perl -pe 's/^(\D\w+ \w+)( Con )/\1;Con/gi' |
perl -pe 's/^(\D\w+ \w+ >\w+)( Con )/\1;Con/gi' |
perl -pe 's/^(\D\w+ \w+ \w+ \w+)( Con >)/\1;Con/gi' |
perl -pe 's/^(\D\w+ \w+ \w+ \w+ \w+)( Con )/\1;Con/gi'|
perl -pe 's/^(\D\w+ \w+)( Și )/\1;Și/gi' |
perl -pe 's/^(\D\w+ \w+ \w+)( >Și )/\1;Și/gi' |
perl -pe 's/^(\D\w+ \w+ \w+ \w+)( Și )/\1;Și/gi' |
perl >-pe 's/^(\D\w+ \w+ \w+ \w+ \w+)( Și )/\1;Și/gi'| > /PATH/File.txt
I’ve these files:
Betty Curtis & Orchestra - Song Title Betty Curtis Con Johnny Dorelli - Song Title Betty Curtis - Song Title Margareta Pâslaru - Song Title Margareta Pâslaru & Grup - Song Title Margareta Pâslaru Și Sincron - Song Title Matilde Sánchez - Song Title Matilde Sánchez Con El Mariachi Vargas De Tecalitlán - Song Title
The output desidered would be:
Betty Curtis; 3 Margareta Pâslaru; 3 Matilde Sánchez; 2
The output that comes instead is:
Betty Curtis; 3 Margareta Pâslaru; 1 Margareta Pâslaru & Grup; 1 Margareta Pâslaru Și Sincron; 1 Matilde Sánchez; 1 Matilde Sánchez Con El Mariachi Vargas De Tecalitlán; 1
Exactly, the code is very complicated (the entire script counts nineteen lines...). The rule is to truncate the name if there are conjunctions, or paranthesis, except if the name is composed of a single word. If there are no conjunctions, or paranthesis, the name is saved in full
eg: “Gervis Quebodeaux Rayne Serenaders” remains “Gervis Quebodeaux Rayne Serenaders;
I'd like to compact the "Perl -pe" section: (D w + w +), (D w + w + w +) etc ... is boring. But I do not know how I can do it.
I had to find a balance between summary to make the count and the need to keep as much information as possible.
I have, at the moment, 30 cases (rules) in addition to “&” I’ve “ With ” “ Con ” “ e ” “ Y ” “ Et ” “ Und “… etc in many languages of the world.
The script works fine but does not work with names where there are accented and particular letters
The script works like this:
For example, I have many files of Duke Ellington, with many different historical headers.
Duke Ellington: 2 files Duke Ellington & Cotton Club O.: 3 Duke Ellington & His Famous O.: 7 Duke Ellington & His Famous O.;(Ft. Ben Webster): 4 Duke Ellington & His Famous O.;(Ft. Johnny Hodges): 3 Duke Ellington & His O.: 129 Duke Ellington & His O. (ft. Ben Webster): 14 Duke Ellington & His O. (Ft. Johnny Hodges): 8 Duke Ellington & His O. (pn.): 2 Duke Ellington &His O. (v. Al Hibble): 1 Duke Ellington &His O. (v. Al Hibbler): 1 Duke Ellington &His O. (v. Herb Jeffries): 9 Duke Ellington &His O. (v. Ozzie Bailey): 1 Duke Ellington &His O. (v. Ozzie Bailey, Ray Nance Vln.): 1 Duke Ellington &His O.;(v. Ray Nance?): 1 Duke Ellington &His O.;(v.M): 1 Duke Ellington (Ft. Rhythm Boys (2°c Bing Crosby, Al Rinker, & Harry Barris)): 1 Duke Ellington (Ft. Rhythm Boys (Bing Crosby, Al Rinker, & Harry Barris)): 1 Duke Ellington (v. Dick Robertson): 1 Duke Ellington w Count Basie: 3 Duke Ellington w Gerald Wilson: 13 Duke Ellington’s Spacemen: 1 Duke Ellington’s Washingtonians: 1
Through the work of the script that produces this file
Duke Ellington; 2 Duke Ellington;&Cotton Club O.; 3 Duke Ellington;&His Famous O.; 7 Duke Ellington;&His Famous O.;(Ft. Ben Webster); 4 Duke Ellington;&His Famous O.;(Ft. Johnny Hodges); 3 Duke Ellington;&His O.; 129 Duke Ellington;&His O.;(ft. Ben Webster); 14 Duke Ellington;&His O.;(Ft. Johnny Hodges); 8 Duke Ellington;&His O.;(pn.); 2 Duke Ellington;&His O.;(v. Al Hibble); 1 Duke Ellington;&His O.;(v. Al Hibbler); 1 Duke Ellington;&His O.;(v. Herb Jeffries); 9 Duke Ellington;&His O.;(v. Ozzie Bailey); 1 Duke Ellington;&His O.;(v. Ozzie Bailey, Ray Nance Vln.); 1 Duke Ellington;&His O.;(v. Ray Nance?); 1 Duke Ellington;&His O.;(v.M); 1 Duke Ellington;(Ft. Rhythm Boys (2°c Bing Crosby, Al Rinker, & Harry Barris)); 1 Duke Ellington;(Ft. Rhythm Boys (Bing Crosby, Al Rinker, & Harry Barris)); 1 Duke Ellington;(v. Dick Robertson); 1 Duke Ellington;w Count Basie; 3 Duke Ellington;w Gerald Wilson; 13 Duke Ellington; Spacemen; 1 Duke Ellington; Washingtonians; 1
This is the output:
Duke Ellington: 208
Code complete: https://www.sendspace.com/file/dlep9q