Regex: Extract string between string1 and string2, whereas string2 can be one of many strings

Question

I got multiple text files that resemble dictionary entries. One such text file can look like this:

MEANING:
content1
content2
IDIOM:
content3
content4

Another can be like this:

MEANING:
content1
content2
SYNONYMS:
content2
content3
content5

Now my wish is to extract the content of "MEANING" section using one sed command line. Here is my idea for the first text file, where "IDIOM" comes after the "MEANING" part:

cat dicentry1.txt | sed -e 's/MEANING\(.*\)IDIOM/\1/')

Thing is, that the output is:

MEANING:
content1
content2
IDIOM:
content3

However, this doesn't even work yet, even though user "Brian Campbell" suggested exact same line with just other values in this thread: How to use sed/grep to extract text between two words?

My second problem would be to do this with the second file, where "SYNONYMS" comes after the "MEANING" part. Technically, I could do just the same as above but with "/SYNONYMS" instead of "/IDIOM". However, wouldn't be something like this possible?

DISCLAIMER: It is in idea and the syntax may be completely wrong, I apologize for that in advance T.T

cat anydicentry.txt | sed -e 's/MEANING\(.*\)\(IDIOM|SYNONYM\)/\1/')

What this line is suppose to do is to copy everything after "MEANING" to the point where either "IDIOM" or "SYNONYMS" appears. However, I still can't get this working and I have no idea how I could implement it.

I hope that you understand my two issues I am having.

Thanks in advance, guys!

`sed -e 's/MEANING\(.*\)IDIOM/\1/'` would work if you had `MEANING some text here IDIOM` on one line. `sed` only searches on a single line by default. — Wiktor Stribiżew, Jun 16 '18 at 18:00

score 2 · Answer 1 · answered Jun 16 '18 at 18:20

2

For processing files line by line like this awk is much better tool as awk is complete programming language.

awk '/^(IDIOM|SYNONYMS)/{p=0} p; /^MEANING/{p=1}' file

content1
content2

Note that same output is retrieved with both of your input files.

Explanation:

/^(IDIOM|SYNONYMS)/{p=0}: When line starts with IDIOM or SYNONYMS then reset a flag p=0
p;: When p==1 then print each line (default action)
/^MEANING/{p=1}: When line starts with MEANING then set a flag p=1

answered Jun 16 '18 at 18:20

anubhava

761,203
64
569
643

Sorry for the late response and thanks alot for your answer! Isn't there a way to use sed instead of awk? – Two-Tu Jun 17 '18 at 13:34
It can be done using `sed` as you can see in answer posted by Cyrus. Though as you can see it is much more straight forward using `awk`. – anubhava Jun 17 '18 at 13:43

score 0 · Accepted Answer · answered Jun 16 '18 at 18:38

0

sed -n '/^MEANING:$/,/^[A-Z]*:$/{/^MEANING:$/d;/^[A-Z]*:$/d;p}' file

Output:

content1
content2

answered Jun 16 '18 at 18:38

Cyrus

84,225
14
89
153

Regex: Extract string between string1 and string2, whereas string2 can be one of many strings

2 Answers2