3

i have got this text file

AAAA
1234
title example
Lorem Ipsum
FF
AAAA
1234
title example
€330 - Roma
FF 

I want to extract from this file only the txt that:

START WITH AAAA
HAS Euro SYmbol
END WITH FF

In this case i want to match only that

AAAA
1234
title example
€330 - Roma
FF 

I tried with different solution i used

sed -e '/AAAAs/,/europ/,/FF/!d' testfile.txt

but it extract all txet between AAAA and FF

How can i solve it?

Thanks for help

EDIT:

between euro line and FF there could be some text. i don't know how many lines..

AAAA
1234
title example
€330 - Roma
Some text with \n, comma symbol etc etc
FF

i wat to extract the txt between AAAA and FF

fedorqui
  • 275,237
  • 103
  • 548
  • 598
Francesco
  • 41
  • 7

5 Answers5

3

With sed:

 sed -n '/^AAAA/{:a;N;/\nFF/!ba; /€/p}' file

How it works:

  • /^AAAA/: from lines starting with AAAA
  • :a: label a for upcoming loop
  • N: adds next line to pattern space
  • /\nFF/!: if newline followed by FF is not found,
  • :ba: loops to a label to add next line to pattern space
  • /€/p: outputs if is found

Edit:

As suggested by @potong in comments, with GNU sed you can also use the M command to match your regex in multi-line mode:

sed -n '/^AAAA/{:a;N;/^FF/M!ba; /€/p}' file
SLePort
  • 15,211
  • 3
  • 34
  • 44
1

A nice quick way would be to use grep with multiple search patterns. So for your needs:

grep -B3 -A1 -e '€' test.txt

This will find the Euro symbol, and print the 3 lines before and the 2 after, however this will only work if you expect the file to remain in the same patter, i.e AAAA and FF occur the same amount of lines above and below.

Nick H
  • 1,081
  • 8
  • 13
1

Python is a procedural language, so it may require more text but is simpler for complex things. Here you should:

  • start storing when you see a AAAA line
  • end storing when you see a FF line and
    • only keep the stored text if it contains a $

That can be translated in Python as:

with open(infile) as fd:
    processing = False
    txt = None
    euro = None
    for line in fd:
        if line.strip() == 'AAAA':     # start processing
            processing = True
            txt = ""
            euro = False
        if processing:
            txt += line                # store all lines between AAAA and FF
            if '€' in line: euro = True    # is an € present ?
            if line.strip() == 'FF':   # stop processing
                processing = False
                if euro:               # only print if a € was found
                    print(txt)

Not as compact as an awk, grep or sed script, but simple to write, read and maintain

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0
awk 'NR>5' file

AAAA
1234
title example
€330 - Roma
FF 
Claes Wikner
  • 1,457
  • 1
  • 9
  • 8
0
awk '/\xe2\x82\xac/{printf RS $0}' RS=AAAA file
mop
  • 423
  • 2
  • 11
  • While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Donald Duck Mar 11 '17 at 19:44
  • Euro SYmbol => € => \xe2\x82\xac echo €|hexdump -C – mop Mar 12 '17 at 04:42
  • Why don't you use /€330/ instead of something cryptic? Like this: awk '/€330/{printf RS $0}' RS=AAAA file – Claes Wikner Mar 12 '17 at 20:32