1

I would like to transform a MySQL script into a JSON file and was asked to use Bash for it. By writing a simple shell script:

#!/bin/bash
# I know this script just output each entry with its value, because I haven' t gone any further
for filename in $dir/home/*.sql
do
    cat $filename | while read line
    do
        names=${line%values*}
        names=${names#*(}
        names=${names%)*}
        values=${line#*values(}
        values=${values%)*}
        while [[ $names != $currentname ]]
        do
            currentname=${names%%,*}
            currentvalue=${values%%,*}
            echo $currentname
            echo $currentvalue
            names=${names#*,}
            values=${values#*,}
        done
    done
done

I have been basically able to fulfill the requirement. However, there is one more problem. Some of the string entries has comma among its characters.

This causes a mistake that my script thinks these commas as the ones that separates values and thus a string bearing comma will be treated as two different strings.

It would be an easy task to solve this with programming languages like C++, but I have been asked to do this only with bash shell script although I am not familiar with it. So now I have been stuck with no clue. Maybe regular expression would be the cure? Or if there are other approaches please also help.

FYI, here is an example of the problem:

Input:

values(100, 'A100', 'A,100');

Expected output:

100
'A100'
'A,100'

Actual current output:

100
'A100'
'A
100'
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • `It would be an easy task to solve this with programming languages like C++` Writing a parser is easy? – KamilCuk Apr 12 '21 at 07:24
  • As you said correctly, `bash` is not the first choice to write a parser in. Looking at your input, it is syntactically like a [M4 macro](https://www.gnu.org/software/m4/manual/m4-1.4.15/html_node/Input-processing.html#Input-processing). You could based on the first word of the input, create a M4 macro definition which just outputs all its arguments, and, combined with the original input, creates the result you see. It's not the most elegant solution, but easier than writing a parser. – user1934428 Apr 12 '21 at 07:25
  • @ogmskdlksdrq : Your example makes me wonder: What is the expected output of `values("a,b", 'c,\' d')`? – user1934428 Apr 12 '21 at 07:27

3 Answers3

0

Something like this may help:

data="values(100, 'A100', 'A,100');"
json=${data//values(}
json=${json//);}
json=${json//, /$'\n'}
echo "$json"
Ivan
  • 6,188
  • 1
  • 16
  • 23
0

Expected output:

Typically in shell you would match it with a regex:

echo "values(100, 'A100', 'A,100');" | sed 's/values(//; s/\(, \|);\)/\n/g'

but this does not solve the problem at all.

The best and only solution is to write a real parser for real mysql langauge to 'handle' '' ' ' 'all\tcorner\'cases' properly. Read the input char by char, store state (ex. if you are inside quotation or not), handle '\'' and other \n etc. sequences for the need of extracting the field. You might interest yourself in mysql internal lexer (it's big!) and lex and yacc programs.

Check your scripts with http://shellcheck.net . Read https://mywiki.wooledge.org/BashFAQ/001 . Quote variable expansions. Don't be nominated for useless cat award.

and was asked to use Bash for it.

Bash is a shell - it's primary role is to run and connect other programs with each other. Bash is a shell, not a full blown programming language, and writing programming stuff in it is going to be very hard or it just ends up using external programs, as that's what it's for. Write the parser in other language - use bash to run it. If you're comfortable in C++, write it in C++ inside a bash script, then compile and execute it inside a bash script.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
0

A common arrangement is to use regex for this, yes; for example, this is a requirement for parsing CSV files. But you can parse the line piece by piece like in your attempt.

However, you have a number of quoting errors which would prevent your code from working even if you figured out a way to parse the input the way you want to. (And of course, get rid of the Useless use of cat?)

while read -r line; do
    case $line in
     *values\(*\)\; );;
     *) continue;;
    esac
    line=${line#values\(}
    line=${line%\)\;}
    while [ "$line" ]; do
        case $line in
         \'*)
            line=${line#\'}
            tail=${line#*\'}
            value=\'${line%"$tail"}
            line=${tail#,}
            line=${line# };;
         *) value=${line%%,*}
            line=${line#*,}
            line=${line# };;
       esac
       echo "$value"
    done
done <"$filename"

This is probably not really the way to go, just a hint if you really want to try to tackle this in Bash. I would write a simple parser in Python if I wanted to cover all bases.

tripleee
  • 175,061
  • 34
  • 275
  • 318