1

I have a python like dictionary in an input file:

$ cat test.txt
db={1:['a','b','c','d'], 2:['aa','bb','cc','dd']}

Each list in dictionary only has 4 indexes not less or more. I need a result like:

one1="a"
two1="b"
three1="c"
four1="d"

one2="aa"
two2="bb"
three2="cc"
four2="dd"

I know this is simple if we use python here, but I should do the job in bash script. Is it possible? How can I do the job using bash script?

M.J
  • 315
  • 3
  • 9
  • 2
    It's not clear what you mean by `I have` - is `db={1:['a','b','c','d'], 2:['aa','bb','cc','dd']}` a line in a file, is it a variable you're trying to populate in a shell script, is it a variable being populated in a python script or something else? Please [edit] your question to clarify what your input is, get rid of all the `...`s that are cluttering up your sample input and expected output and making it untestable, and show what you've tried yourself so far to solve the problem. – Ed Morton Jun 28 '20 at 11:47
  • @Ed Morton, thank you, in my bash script I have to get some variables from a file so thought the file format would be good in dictionary format. so I have a test.txt that contains this dictionary format. – M.J Jun 28 '20 at 11:50
  • Don't add information in comments, [edit] your question to provide all information. – Ed Morton Jun 28 '20 at 11:50
  • @Morteza.J : Why did you use an _awk_ tag, if you are interested in a bash solution? If you are willing to go for a different programming language such as _awk_, you can as well use _python_ or _Perl_ or whatever else for it, since they all can be invoked from bash. – user1934428 Jun 29 '20 at 10:58
  • @user1934428 the big difference between awk and python or perl is that awk is a standard UNIX tool available on all UNIX boxes while the other too aren't. If someone asks for a "bash script" they very, **VERY**, rarely mean a script using all shell builtings. – Ed Morton Jun 29 '20 at 13:30

3 Answers3

3

This can be done with a single sed command (Tested in GNU sed 4.8. Assumes the whole expression is in a single line and there is no embedded single quote between a pair of matching single quotes):

echo "db={1:['a','b','c','d'], 2:['aa','bb','cc','dd']}" |
sed -E "s/^[^{]*\{//; s/\}[^}]*$//; s/([^:]+):\['([^']*)','([^']*)','([^']*)','([^']*)'\](, *)?/one\1='\2'\ntwo\1='\3'\nthree\1='\4'\nfour\1='\5'\n\n/g"

outputs

one1='a'
two1='b'
three1='c'
four1='d'

one2='aa'
two2='bb'
three2='cc'
four2='dd'

Explanation:

-E

Use extended regular expression so that we don't quote (, ), + characters.

s/^[^{]*\{//;

Deletes characters at the beginning of the line until and including the { character

s/\}[^}]*$//;

Deletes the } character and trailing characters (if any) at the end of line

s/([^:]+):\['([^']*)','([^']*)','([^']*)','([^']*)'\](, *)?/one\1='\2'\ntwo\1='\3'\nthree\1='\4'\nfour\1='\5'\n\n/g
  -------    -------   -------   -------   -------   -----  -----------------------------------------------------
     1          2         3         4         5        6                      R

1: Captures the text until :
2: Captures the text between the first pair of single quotes
3: Captures the text between the second pair of single quotes
4: Captures the text between the third pair of single quotes
5: Captures the text between the fourth pair of single quotes
6: Captures the , and any number of trailing space characters. This subexpression is not used in the replacement text. ? means this is optional.
R: Replacement text. \1, \2, \3, \4, and \5 are replaced with the corresponding captured text.
The g flag at the end of the s command ensures that the replacement is applied to all matches.

M. Nejat Aydin
  • 9,597
  • 1
  • 7
  • 17
1

you just need to strip off all the unnecessary characters and loop through them to get your result

#!/bin/bash
db="{1:['a','b','c','d'], 2:['aa','bb','cc','dd']}"
count=1
for items in `echo $db|sed 's/{//;s/}//'`
do
        echo one${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f1`
        echo two${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f2`
        echo three${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f3`
        echo four${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f4`
        echo ''
        count=`expr $count + 1`
done

Output

one1 = 'a'
two1 = 'b'
three1 = 'c'
four1 = 'd'

one2 = 'aa'
two2 = 'bb'
three2 = 'cc'
four2 = 'dd'
  • Thanks a lot, that's great. how can I add output to array. for example: echo one${count} = `echo $items|sed 's/^.*\[//;s/\].*$//'|cut -d ',' -f1` add this results to an array – M.J Jun 28 '20 at 12:02
  • 1
    In addition to simply being the wrong approach (see [why-is-using-a-shell-loop-to-process-text-considered-bad-practice](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice)), that script literally contains more bugs than lines of code. Copy/paste it into http://shellcheck.net and it'll tell you about **some** (but probably not all) of the issues. – Ed Morton Jun 28 '20 at 16:35
  • 1
    @Ed Morton, Thank you. – M.J Jun 29 '20 at 03:39
1

This will work robustly using any awk in any shell on all UNIX boxes and is trivial to enhance if you need to use it for more than 4 items per list just by adding more names for numbers to the string in the BEGIN section:

$ cat tst.awk
BEGIN { split("one two three four",names) }
{
    while ( match($0,/[0-9]+:\[('[^']*',?)+/) ) {
        idx = list = substr($0,RSTART,RLENGTH)

        sub(/:.*/,"",idx)
        sub(/[^[]+\[/,"",list)

        split(list,items,/'/)
        for (i=2; i in items; i+=2) {
            printf "%s%d=\"%s\"\n", names[i/2], idx, items[i]
        }
        print ""

        $0 = substr($0,RSTART+RLENGTH)
    }
}

.

$ awk -f tst.awk file
one1="a"
two1="b"
three1="c"
four1="d"

one2="aa"
two2="bb"
three2="cc"
four2="dd"
Ed Morton
  • 188,023
  • 17
  • 78
  • 185