How to get paragraphs of text by index number

Question

I am wondering if there is a way to get paragraphs of text (source file would be a pyx file) by number as sed does with lines

sed -n ${i}p

At this moment I'd be interested to use awk with:

awk '/custom-pyx-tag$/,/$custom-pyx-tag/'

but I can't find documentation or examples about that.

I'm also trying to trim "\r\n" with gsub(/\r\n/,"; ") int the same awk command but it doesn't work, and I can't really figure out why.

Any hint would be very appreciated, thanks

EDIT:

This is just one example and not my exact need but I would need to know how to do it for a multipurpose project

Let's take the case that I have exported the ID3Tags of a huge collection of audio files and these have been stored in a pyx-like format, so in the end I will have a nice big file with this pattern repeating for each file in the collection:

audio-genre(
blablabla
)audio-genre
audio-artist(
bla.blabla
)audio-artist
audio album(
bla-bla-bla
)audio-album
audio-track-num(
0x
)audio-track-num
audio-track-title(
bla.bla-bla
)audio-track-title
audio-lyrics(
blablablablabla
bla.bla.bla.bla
blah-blah-blah
blabla-blabla
)audio-lyrics
...

Now if I want to extract the artist of the 1234th audio file I can use:

awk '/audio-artist\(/, /)audio-artist/' | sed '/audio-artist/d' | sed -n 1234p

so being one line it can be obtained with sed, but I don't know how to get an entire paragraph given its index, for example if I want to get the lyrics of the 6543th file how could I do it?

In the end it is just a question of whether there is a command equivalent to sed -n $ {num} p but to be used for paragraphs

Welcome to SO and special thanks for showing your efforts in your question. Could you please do add sample of input and sample of expected output in your question and let us know then for better understanding of question. — RavinderSingh13, Nov 06 '20 at 08:19
for `\r\n`, see https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it ... for paragraphs, you can set `RS` to empty string and then use `NR` or `FNR` in `awk` (sed won't suit here) — Sundeep, Nov 06 '20 at 08:23
Untested so not an answer: `perl -00 -ne 'print if $. == '"$i" input.txt` — Shawn, Nov 06 '20 at 11:40

score 0 · Answer 1 · answered Nov 09 '20 at 15:26

awk -v indx=1024 
    'BEGIN {
             RS=""
           }
           { split($0,arr,"audio-artist");
             for (i=2;i<=length(arr);i=i+2) 
                                          { gsub("[()]","",arr[i]);
                                            arts[cnt+=1]=arr[i] 
                                          } 
            } 
     END { 
            print arts[indx] 
         }' audioartist

One liner:

awk -v indx=1234 'BEGIN {RS=""} NR==1 { split($0,arr,"audio-artist");for (i=2;i<=length(arr);i=i+2) { gsub("[()]","",arr[i]);arts[cnt+=1]=arr[i] } } END { print arts[indx] }' audioartist

Using awk, and the file called audioartist, we consume the file as one line by setting the records separator (RS) to "". We then split the whole file into an array arr, based on the separator audio-artist. We look through the array arr starting from 2 in steps of 2 till the end of the array and strip out the opening and closing brackets, creating another array called arts with an incrementing count as the index and the stripped artist as the value. At the end we print the arts index specified by the passed indx variable (in this case 1234).

thanks, I solved in an another way but I saved your line, it would be useful to understand awk better, thanks a lot — Michele Frau, Nov 11 '20 at 00:15

How to get paragraphs of text by index number

1 Answers1