I'm trying to do a simple script with a regex. This regex works in texts editors and online regex checkers. But I can't find how to make it work on bash.
I need to capture groups, by the way.
Example text:
2020-03-06 10:00:07 Test2: <?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soape...
2020-03-06 10:00:13 Test2: <?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soape...
This is my script. It reads each line and creates a file named DATE_HOUR.xml filled with the text until the end of the line (after formatting it):
#!/bin/bash
: ${1?"USO: $0 NOMBRE-DEL-ARCHIVO"} #If no args passed
regex="^(\d*-\d*-\d*)\s(\d*:\d*:\d*)\s(\w*): (.*)$" #This one is working on editors
mkdir -p out
while read line
do
if [[ $line =~ $regex ]] #IT NEVER ENTERS HERE
then
date="${BASH_REMATCH[1]}" #DATE
time="${BASH_REMATCH[2]}" #TIME
time="${time/:/-}" #REPLACE : with -
name="${BASH_REMATCH[3]}" #I DO NOT USE IT BY NOW
text="${BASH_REMATCH[4]}" #TEXT
echo $text | xmllint --format - > out/$date"_"$time.xml
fi
done < $1
I've tried this regex, but it sure has errors:
regex="^([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}) ([[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}) ([[a-zA-Z0-9]]{1,}): (*{1,})$"
Thank you.