0

I'm looking for a way to do the following operation in bash:

  1. Specify an input file (JSON)

  2. Write each line to a new file (lines could be limited by Regex pattern, not necessary though)

  3. Name each file after a specific JSON value in the file

In previous attempts, I tried using a simple split (without the naming part) for the task, but it exited after a certain amount of lines. My biggest file has about 1000 lines.

Example input:

{
        "stuff":
        [
            { "data": "123", "filename": "abc.xml" },
            { "data": "456", "filename": "def.xml" },
            { "data": "789", "filename": "ghi.xml" }
       ]
}

Example output:

Contents of abc.xml:

<?xml version="1.0" encoding="UTF-8"?>
<data>123</data>

Contents of def.xml:

<?xml version="1.0" encoding="UTF-8"?>
<data>456</data>

PS: The example was simply chosen to get you an idea, though the input file resembles that of my real scenario.

idleberg
  • 12,634
  • 7
  • 43
  • 70
  • Post some sample input and expected output. – Ed Morton May 08 '14 at 12:51
  • two Q: 1) why this has to be done by awk/bash? there are many json parsers/converters in programming/scripting languages. 2) if you do have reasonable reason to do it with bash/awk, what have you done towards your goal? which part you have problem? – Kent May 08 '14 at 13:26
  • @Kent The main problem is clearly the creation of the files according to the value in the JSON file. I previously used `split` to split up the file by lines, then performed a Regex search&replace in Subliem Text to edit the single files. This did not address my main problem: the naming of files. Also, `split`proved to be unreliable. The total number of lines in my JSON files is roughly 9,000 to give you an idea. – idleberg May 08 '14 at 13:32
  • @idleberg the requirement is not clear enough to me. E.g. 1) you said json, the structure could be completely different from your example. 2) if you just want to handle those lines matching `data:xxx,filename:xxx`? and are the format fixed? this is important info for a text processing tool. 3) could there be nested objects? like `data: foo[...data: bar [...]] filename` 4) can I understand your question is convert text `"data": "123", "filename": "abc.xml"` to the text in your `abc.xml`? – Kent May 08 '14 at 13:42
  • @Kent 1.) The structure of the JSON matches my scenario. The target format is not important, it could be JSON as well as long as the name is extracted from the source JSON. 2. Apart from the opening/closing brackets, there will be no other lines. 3.) No nested objects. 4.) Exactly – idleberg May 08 '14 at 13:49

2 Answers2

1

since you said the format is fixed, this would work for you:

kent$  ls 
f


kent$  awk -F'"' '/data/{printf "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<%s>%s</%s>\n", $2,$4,$2>$8; close($8) }' f

kent$  ls                                                                                                               
abc.xml  def.xml  f  ghi.xml

kent$  head *.xml
==> abc.xml <==
<?xml version="1.0" encoding="UTF-8"?>
<data>123</data>

==> def.xml <==
<?xml version="1.0" encoding="UTF-8"?>
<data>456</data>

==> ghi.xml <==
<?xml version="1.0" encoding="UTF-8"?>
<data>789</data>
Kent
  • 189,393
  • 32
  • 233
  • 301
  • @MikeH-R No parsing json/xml/html/... is not good use case for awk. – Kent May 08 '14 at 15:13
  • hahaha, no of course not, I was just saying for what is essentially csv data since OP said the data was this regular. – Mike H-R May 08 '14 at 15:14
  • and of course: [obligatory link to SO question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Mike H-R May 08 '14 at 15:22
0

I think awk's a nicer way of doing it, if it's pure bash you want then you could try this:

➜  scripts  cat my_bash_example.sh
#!/bin/bash

while read -r variable; do
    if [[ "$variable" =~ .*\.xml ]]; then
        echo "making xmlfile"
        echo $variable;
        cat temp.tmp > $variable
    else
        echo "making tempfile"
        echo $variable
        echo $variable > temp.tmp
    fi
    done
➜  scripts  cat input.json | grep data | grep -oP '(?<=data": "|filename": ").*?(?=")' | my_bash_example.sh

I decided to use this to practice my pipes, grep and bash_scripts. It's kinda ugly though, (and breaks in many cases). I like the awk better.

Oh and btw, I used bash as that was what was stated in the question, I would use a scripting language if I were to do this myself.

Mike H-R
  • 7,726
  • 5
  • 43
  • 65