Extracting text between two strings. These strings have spaces and are saved in variables

Question

I need to extract all the texts between the dates in the following manner (The format for the below is: Month Day Hour):

start_marker: "Jul  3 2" 
end_marker: "Jul  3 7"

from a log file that has data in the following example format

<unneeded text>
Fri Jul  3 2:51:54:780 2020
<needed text> 
<needed text> 
<needed text> 
Fri Jul  3 5:51:54:780 2020 
<needed text> 
<needed text> 
Fri Jul  3 7:51:54:780 2020 
<unneeded text>

I am trying the below script but it returns a blank log_collector file

start_month="Jul"
start_date="3"
start_hour="2"

end_month="Jul"
end_date="3"
end_hour="7"

start_marker="$start_month  $start_date $start_hour"
end_marker="$end_month  $end_date $end_hour"

sed -n '/"$start_marker"/,/"$end_marker"/p' logfile >> "log_collector"

cat log_collector

Your sed script is enclosed in single quotes, so no variable expansion. Try removing the double quotes and then change single to double for the sed script. — Bill Jetzer, Jul 04 '20 at 01:04
what if start-/`end_hour` does not exist? should it search for closest hour/timestring x < start / end < z — alecxs, Jul 04 '20 at 01:16
@alecxs Well, the log file keeps updating every 5 mins, so it's a very remote possibility. But I didn't think of that, should have made it clear in my question. Thanks for the call out. — Anupam, Jul 04 '20 at 14:26
Can any of the `` lines include text that looks like a date, e.g. `Jul 3 2`? If so how can you separate text in lines like that from the date lines you actually want to match against? — Ed Morton, Jul 06 '20 at 12:50
Hey @EdMorton, None of the will have anything that looks like a date. — Anupam, Jul 09 '20 at 15:29

wuseman · Answer 1 · 2020-07-04T02:54:02.460

3

Use double quotes when using sed + variables otherwise sed wont read your variables, your script is now readed/executed as the file has been written in your example:

+ start_month=Jul
+ start_date=3
+ start_hour=2
+ end_month=Jul
+ end_date=3
+ end_hour=7
+ start_marker='Jul  3 2'
+ end_marker='Jul  3 7'
+ sed -n '/"$start_marker"/,/"$end_marker"/p' logfile 
+ cat log_collector
...empty file

Instead try:

sed -n "/${start_marker}/,/${end_marker}/p" logfile >> "log_collector"

Result:

+ variables...
+ sed -n '/Jul  3 2/,/Jul  3 7/p' logfile
+ cat log_collector
Fri Jul  3 2:51:54:780 2020
text...

And your script will now output the variables as you want.

But I really don't see the point with using start_* and end_* variables when you using *_marker for the same values, but maybe it was just an bad/confusing example :)

Hint: Launch your script with 'bash -x' or add 'set -x' and you will see how script is launched.

Edit: Bill Jetzer was faster I see in your comments, however see examples above.

edited Jul 04 '20 at 02:54

answered Jul 04 '20 at 02:30

wuseman

1,259
12
20

Thanks for clear example, bit new here on sed. Got to know that I had to use the curly brackets too. Also, I updated the question, you are right about the variables and markers, I made them same to avoid confusion. – Anupam Jul 04 '20 at 14:35
1

@Anupam you don't need the curly brackets - they don't hurt but the script would behave exactly the same without them. The issue with your script was incorrect quoting, copy/paste your original script into http://shellcheck.net to see the issues. Do be aware though that without boundaries the script contains bugs - `Jul 3 1` would match the line `Jul 3 12`, for example. – Ed Morton Jul 05 '20 at 19:18
Thanks @EdMorton, appreciate the input. Forgive my noob question, but by boundaries you mean the curly brackets? If not, then how would I set a boundary for this? – Anupam Jul 05 '20 at 22:05
No, the curly brackets just surround variables when there's no existing separator, e.g. `foo${var}bar` to expend the value of `$var` between `foo` and `bar`, Boundaries in the context of a regexp define where the matching string must begin/end, e.g. `$` for end of string`, `^` for start of string, `\b` for start/end of a word in some tools, etc.. You should ask the person who posted this answer how to make it more robust. – Ed Morton Jul 06 '20 at 12:41

Ed Morton · Answer 2 · 2020-07-06T13:33:49.507

FWIW I'd use a flag (inRange below) instead of a range (which excludes sed since it doesn't have variables) and only check for the date/time markers on lines that look like your date/time lines (hence the long-ish regexp below):

$ cat tst.awk
BEGIN { FS = "[[:space:]:]+" }
/^([[:upper:]][[:lower:]]{2} +){2}[0-9]{1,2} +([0-9]{1,2}:){3}[0-9]{3} +[0-9]{4} *$/ {
    marker = $2" "$3" "$4
}
marker == start_marker { inRange = 1 }
inRange { print }
marker == end_marker { inRange = 0 }

.

$ awk -v start_marker='Jul 3 2' -v end_marker='Jul 3 7' -f tst.awk file
Fri Jul  3 2:51:54:780 2020
<needed text>
<needed text>
<needed text>
Fri Jul  3 5:51:54:780 2020
<needed text>
<needed text>
Fri Jul  3 7:51:54:780 2020

See Is a /start/,/end/ range expression ever useful in awk? for why I wouldn't use a range expression (/start/,/end/).

Extracting text between two strings. These strings have spaces and are saved in variables

2 Answers2

Use double quotes when using sed + variables otherwise sed wont read your variables, your script is now readed/executed as the file has been written in your example:

Instead try:

Result:

And your script will now output the variables as you want.