Manipulate txt searching for three pattern (sed,awk,pcregrep)

Question

i have got this text file

AAAA
1234
title example
Lorem Ipsum
FF
AAAA
1234
title example
€330 - Roma
FF

I want to extract from this file only the txt that:

START WITH AAAA
HAS Euro SYmbol
END WITH FF

In this case i want to match only that

AAAA
1234
title example
€330 - Roma
FF

I tried with different solution i used

sed -e '/AAAAs/,/europ/,/FF/!d' testfile.txt

but it extract all txet between AAAA and FF

How can i solve it?

Thanks for help

EDIT:

between euro line and FF there could be some text. i don't know how many lines..

AAAA
1234
title example
€330 - Roma
Some text with \n, comma symbol etc etc
FF

i wat to extract the txt between AAAA and FF

Does the line with `€` always occur at 4th line from `AAAA` and just before `FF`? — Inian, Mar 03 '17 at 08:47
the structure is a little bit tricky, AAAA-A title, euro symbol, text containg \n, and FF — Francesco, Mar 03 '17 at 09:07

SLePort · Answer 1 · 2017-03-03T10:40:27.783

3

With sed:

 sed -n '/^AAAA/{:a;N;/\nFF/!ba; /€/p}' file

How it works:

/^AAAA/: from lines starting with AAAA
:a: label a for upcoming loop
N: adds next line to pattern space
/\nFF/!: if newline followed by FF is not found,
:ba: loops to a label to add next line to pattern space
/€/p: outputs if € is found

Edit:

As suggested by @potong in comments, with GNU sed you can also use the M command to match your regex in multi-line mode:

sed -n '/^AAAA/{:a;N;/^FF/M!ba; /€/p}' file

edited Mar 03 '17 at 10:40

answered Mar 03 '17 at 09:01

SLePort

15,211
3
34
44

If using GNU sed `sed -n '/^AAAA/{:a;N;/^FF/M!ba; /€/p}' file also may appeal. – potong Mar 03 '17 at 09:34
@user3720159 Glad it works. I added some explanations. – SLePort Mar 03 '17 at 10:19
@SLePort - Can you please explain this part - "!ba; " – VIPIN KUMAR Mar 03 '17 at 10:35
@VIPINKUMAR `ba` is for looping(`b` for branch) to `a` label. I added a line for it in answer. – SLePort Mar 03 '17 at 10:46

Nick H · Answer 2 · 2017-03-03T09:19:05.893

1

A nice quick way would be to use grep with multiple search patterns. So for your needs:

grep -B3 -A1 -e '€' test.txt

This will find the Euro symbol, and print the 3 lines before and the 2 after, however this will only work if you expect the file to remain in the same patter, i.e AAAA and FF occur the same amount of lines above and below.

edited Mar 03 '17 at 09:19

answered Mar 03 '17 at 08:53

Nick H

1,081
8
13

As a side note, there are lots of other ways to do this, including python re, however for searching through large data, I've found grep to be the fastest. – Nick H Mar 03 '17 at 08:56
Can you please have a look at OP's expected output? Your command gives `AAAA FF AAAA €330 - Roma FF ` – Inian Mar 03 '17 at 08:57
Oh, I see - between the two parameters. Ok I'll edit that. – Nick H Mar 03 '17 at 08:59
in this way it doesnt return the number 1234 – Francesco Mar 03 '17 at 09:08

score 1 · Answer 3 · answered Mar 03 '17 at 09:45

Python is a procedural language, so it may require more text but is simpler for complex things. Here you should:

start storing when you see a AAAA line
end storing when you see a FF line and
- only keep the stored text if it contains a $

That can be translated in Python as:

with open(infile) as fd:
    processing = False
    txt = None
    euro = None
    for line in fd:
        if line.strip() == 'AAAA':     # start processing
            processing = True
            txt = ""
            euro = False
        if processing:
            txt += line                # store all lines between AAAA and FF
            if '€' in line: euro = True    # is an € present ?
            if line.strip() == 'FF':   # stop processing
                processing = False
                if euro:               # only print if a € was found
                    print(txt)

Not as compact as an awk, grep or sed script, but simple to write, read and maintain

score 0 · Answer 4 · answered Mar 04 '17 at 20:11

0

awk 'NR>5' file

AAAA
1234
title example
€330 - Roma
FF

answered Mar 04 '17 at 20:11

Claes Wikner

1,457
1
9
8

score 0 · Answer 5 · answered Mar 11 '17 at 15:23

0

awk '/\xe2\x82\xac/{printf RS $0}' RS=AAAA file

answered Mar 11 '17 at 15:23

mop

423
2
11

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Donald Duck Mar 11 '17 at 19:44
Euro SYmbol => € => \xe2\x82\xac echo €|hexdump -C – mop Mar 12 '17 at 04:42
Why don't you use /€330/ instead of something cryptic? Like this: awk '/€330/{printf RS $0}' RS=AAAA file – Claes Wikner Mar 12 '17 at 20:32

Manipulate txt searching for three pattern (sed,awk,pcregrep)

5 Answers5

Linked