3

I have a script that reads a log file line-by-line. I need to extract the text between two subtstrings, if they exist in the line my script is currently reading.

For instance, if a line has:

some random text here substring A abc/def/ghi substring B

I need to extract the text abc/def/ghi that is between substring A and substring B by storing it in a variable. How would I go about doing this?

I looked through this Extract substring in Bash but can't find anything that exactly matches my use case.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Ricardo Francois
  • 752
  • 7
  • 24

2 Answers2

6

Bash provides parameter expansion with substring removal that allows you to trim through "substring A"from the front, and then trim "substring B" from the back leaving "abc/def/ghi". For example, you can do:

ssa="substring A"         ## substrings to find text between
ssb="substring B"

line="some random text here substring A abc/def/ghi substring B"

text="${line#*${ssa}}"    ## trim through $ssa from the front (left)
text="${text%${ssb}*}"    ## trim through $ssb from the back (right)

echo $text                ## output result

Example OUtput

abc/def/ghi

The basic two forms for trimming from the front of a string and the two from trimming from the back of a string are:

${var#pattern}      # Strip shortest match of pattern from front of $var
${var##pattern}     # Strip longest match of pattern from front of $var
${var%pattern}      # Strip shortest match of pattern from back of $var
${var%%pattern}     # Strip longest match of pattern from back of $var

Where pattern can contain globbing characters such as '*' and '?'. Look things over and let me know if you have any further questions.

Using BASH_REMATCH

BASH_REMATCH is an internal array that contains the results of matching [[ text =~ REGEX ]]. ${BASH_REMATCH[0]} is the total text matched by REGEX and then ${BASH_REMATCH[1..2..etc]} are the matched portions of the regular expression captures between (...) within the regular expression (of which you can provide multiple captures)

Using the same setup above, you could modify the script the replace the parameter expansions uses with text to use

regex="^.*${ssa} ([^ ]+) ${ssb}.*$"   ## REGEX to match with (..) capture

[[ $line =~ $regex ]] && echo ${BASH_REMATCH[1]}

Where the regular expression in $regex will match the entire line capturing what is between $ssa and $ssb. The complete modified script would be:

ssa="substring A"         ## substrings to find text between
ssb="substring B"

line="some random text here substring A abc/def/ghi substring B"

regex="^.*${ssa} ([^ ]+) ${ssb}.*$"   ## REGEX to match with (..) capture

[[ $line =~ $regex ]] && echo ${BASH_REMATCH[1]}

(same output)

Both methods are fully explained in man 1 bash. Use whichever fits the circumstance you are faced with. I always found parameter expansion a bit more intuitive (and you can incrementally whittle text down to just about anything you need). However, the power of extended regular expression matching can provide a powerful alternative to the parameter expansions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
1

I believe you can do this:

var="$(echo "some random text here substring A abc/def/ghi substring B"|grep -oP "substring A \K(.*) (?=\ substring B)")"

# which produces:
echo $var
abc/def/ghi

or if the following grep is more readable, easier to understand, you can also use this:

grep -oP "(?<=substring\ A\ )(.*)(?=\ substring B)"

This is essentially the same logic as the above one.

This will also work if the searched/matched string is 2 or more words.


Edit 1:

So now I understand you are trying to do this by extracting the last line of a file, and then doing the regex matching? you can do:

var="$(tail -n1 file.txt|grep -oP "(?<=substring\ A\ )(.*)(?=\ substring B)")"

if you are sure that this file's last line would have always a last line matching the pattern in your original Question..

Ron
  • 5,900
  • 2
  • 20
  • 30
  • Sorry, I'm super new to shell scripting. I'm trying out the 2nd command you posted. How do I make sure that it is applied to the last line of the log file I'm reading? – Ricardo Francois Dec 27 '21 at 03:08
  • Yeah looks like I was able to extract the last line, but the commands don't seem to work :/ – Ricardo Francois Dec 27 '21 at 03:29
  • @RicardoFrancois Please provide more information on how you do extract the last line, or what the input file looks like at its very end... If you change your Question, obviously the solution can change too. Show what you expect as an outcome.. and what you've tried exacly. – Ron Dec 27 '21 at 03:43
  • The solution works.. it's how you implement it that might not be working for you ;) – Ron Dec 27 '21 at 03:47
  • Note that the first example that uses the `\K` suffix does NOT work because there should be NO space between `(.*)` and `(?=\ substring B)`. That's what the other two code blocks that use the lookbehinds do. With the space in between the parentheses, the pattern is searching for **two** spaces before the word "substring" instead of one. – Levi Uzodike Oct 18 '22 at 18:29