0

I'm trying to solve a problem that appeared in my script which doesn't let me match the date+time (YYYY-MM-DD HH:MM:SS) inside a for loop

list='"dt_txt":"2022-06-03 21:00:00"},'

regex_datehour='"dt_txt":"([0-9,-]*.[0-9,:]*)'




for i in $list; do
    [[ $i =~ $regex_datehour ]] && echo "${BASH_REMATCH[1]}"
done

It seems that the "." between the two pair of brackets it's not recognizing the space! that's because inside of the list, if I replace the empty space between the date and the time by a _, it works as intended! list='"dt_txt":"2022-06-03_21:00:00"},'

desired output:

2022-06-03 21:00:00

what I get:

2022-06-03
  • 2
    Quote all variables as a good practice `[[ "$i" =~ "$regex_datehour" ]]`. Also, check script syntax and more on https://www.shellcheck.net/ – LMC Jun 07 '22 at 00:31
  • 2
    It looks like you're using a regex to parse JSON. You should use a real JSON parser, like the `jq` utility. – Barmar Jun 07 '22 at 00:35
  • 2
    Inside your `for` loop, add `echo $i`, you will see what `for` does. It splits on the space... – Nic3500 Jun 07 '22 at 01:06
  • 5
    @LMC: This is one of the few cases where you should *not* double-quote something, because if the pattern is quoted it's taken as a literal string rather than as a regular expression. Rayan Araujo: the only problem I see is that `for i in $list; do` will split on whitespace, and therefore run `"dt_txt":"2022-06-03` and `21:00:00"},` as separate items. As Barmar said, you should probably use a real JSON parser. Also, don't use `,` to separate things in a `[ ]` character set, so e.g. use `[0-9-]` instead of `[0-9,-]`. – Gordon Davisson Jun 07 '22 at 01:08

1 Answers1

3

The problem here is one that catches a lot of people, and that is whitespace breaking. In the for loop, your $list variable is not quoted, and it contains a space:

$ list='"dt_txt":"2022-06-03 21:00:00"},'
$ for i in $list ; do echo "i = $i" ; done ;
i = "dt_txt":"2022-06-03
i = 21:00:00"},

Make sure to put double-quotes around all strings that contain variables except regexes:

Using an array for list, which is what makes sense when using the for loop from your original code, it would look something like this:

#!/usr/bin/env bash
# filename: re.sh

list=(
  '"dt_txt":"2022-06-03 21:00:00"},'
  '"dt_txt":"2022-06-03 22:00:00"},'
  '"dt_txt":"2022-06-03 23:00:00"},'
)

regex_datehour='"dt_txt":"([0-9,-]*.[0-9,:]*)'

for i in "${list[@]}" ; do
    [[ "$i" =~ $regex_datehour ]] && echo "${BASH_REMATCH[1]}"
done
$ ./re.sh
2022-06-03 21:00:00
2022-06-03 22:00:00
2022-06-03 23:00:00
danielhoherd
  • 481
  • 2
  • 5
  • I appreciate all the help and critics, I know I should use an appropriate tool for parsing json, but I took that as a challenge. @GordonDavisson I agree with you, I'm definitely not the type of programmer who copy and paste code. I really like to understand how it works before trying – Rayan Araujo Jun 07 '22 at 03:55
  • 1
    @danielhoherd This is not my entire code! the real one does have an array, and I made this as an example just to show the problem – Rayan Araujo Jun 07 '22 at 03:55
  • @RayanAraujo If it's a bash array, use `for i in "${list[@]}"; do`. If it's a JSON array... that's a lot more difficult. JSON allows recursive elements (e.g. arrays within arrays within arrays within...), so it (like [[X\]HTML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)) cannot be parsed with regular expressions. – Gordon Davisson Jun 07 '22 at 04:37