0

Find below the input data:

[{"acc_id": 166211981, "archived": true, "access_key": "ALLLJNXXXXXXXPU4C7GA", "secret_key": "X12J6SixMaFHoXXXXZW707XXX24OXXX", "created": "2018-10-03T05:56:01.208069Z", "description": "Data Testing", "id": 11722990697, "key_field": "Ae_Appl_Number", "last_modified": "2018-10-03T08:44:20.324237Z", "list_type": "js_variables", "name": "TEST_AE_LI_KEYS_003", "project_id": 1045199007354, "s3_path": "opti-port/dcp/ue.1045199007354/11722990697"}, {"acc_id": 166211981, "archived": false, "access_key": "ALLLJNXXXXXXXPU4C7GA", "secret_key": "X12J6SixMaFHoXXXXZW707XXX24OXXX", "created": "2018-10-03T08:46:32.535653Z", "description": "Data Testing", "id": 11724290732, "key_field": "Ae_Appl_Number", "last_modified": "2018-10-03T10:11:13.167798Z", "list_type": "js_variables", "name": "TEST_AE_LI_KEYS_001", "project_id": 1045199007354, "s3_path": "opti-port/dcp/ue.1045199007354/11724290732"}]

I want the output file to contain below data:

11722990697,TEST_AE_LI_KEYS_003,opti-port/dcp/ue.1045199007354/11722990697
11724290732,EST_AE_LI_KEYS_001,opti-port/dcp/ue.1045199007354/11724290732

I am able to achieve the same by taking one record at a time and processing it using awk.but i am getting the field names also.

find below my trial:

R=cat in.txt | awk -F '},' '{print $1}'

echo $R | awk -F , '{print $7 " " $11 " " $13}'

I want it to be done for entire file without field names.

akr
  • 43
  • 7
  • Why not use a real JSON parser? – melpomene Oct 31 '18 at 04:43
  • Besides the obvious syntax errors in your shell script attempts, you need to understand [how to quote things properly](/q/10067266) (and probably also how to [avoid useless `cat`s](/q/11710552)). Maybe see also http://shellcheck.net/ – tripleee Oct 31 '18 at 04:53
  • Maybe see also https://stackoverflow.com/questions/26971987/assignment-of-variables-with-space-after-the-sign – tripleee Oct 31 '18 at 04:57
  • Possible duplicate of [Parsing JSON with Unix tools](https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools) – tripleee Oct 31 '18 at 05:00
  • As a further aside, capturing things in a variable just so you can pipe to another command is an antipattern; just pipe to that command directly. But piping Awk to Awk is almost always silly; your script can be refactored to `awk -F '},' '{ split($1, a, ","); print a[7], a[11], a[13] }' in.txt` which coincidentally also fixes several of your errors by removing redundant steps where you had errors. – tripleee Oct 31 '18 at 05:04
  • i tried this, but i want the same to happen for all records. Also i dont want the attribute name, i want only the values.pls check the output file in the question. – akr Oct 31 '18 at 06:25

2 Answers2

2

AWK/SED is not the right tool for parsing JSON files. Use jq

[root@localhost]# jq -r '.[] | "\(.acc_id),\(.name),\(.s3_path)"' abc.json
166211981,TEST_AE_LI_KEYS_003,opti-port/dcp/ue.1045199007354/11722990697
166211981,TEST_AE_LI_KEYS_001,opti-port/dcp/ue.1045199007354/11724290732

If you don't want to install any other software then you can use python as well which is found on most of the linux machine

[root@localhost]# cat parse_json.py
#!/usr/bin/env python

# Import the json module 
import json

# Open the json file in read only mode and load the json data. It will load the data in python dictionary
with open('abc.json') as fh:
        data = json.load(fh)

# To print the dictionary
# print(data)

# To print the name key from first and second record
# print(data[0]["name"])
# print(data[1]["name"]) 
# Now to get both the records use a for loop
for i in range(0,2):
        print("%s,%s,%s") % (data[i]["access_key"],data[i]["name"],data[i]["s3_path"])
[root@localhost]# ./parse_json.py
ALLLJNXXXXXXXPU4C7GA,TEST_AE_LI_KEYS_003,opti-port/dcp/ue.1045199007354/11722990697
ALLLJNXXXXXXXPU4C7GA,TEST_AE_LI_KEYS_001,opti-port/dcp/ue.1045199007354/11724290732
Manish R
  • 2,312
  • 17
  • 13
0

Assuming the input data is in a file called input.json, you can use a Python script to fetch the attributes. Put the following content in a file called fetch_attributes.py:

import json

with open("input.json") as fh:
    data = json.load(fh)
with open("output.json", "w") as of:
    for record in data:
        of.write("%s,%s,%s\n" % (record["id"],record["name"],record["s3_path"]))

Then, run the script as:

python fetch_attributes.py

Code Explanation

  • import json - Importing Python's json library to parse the JSON.
  • with open("input.json") as fh: - Opening the input file and getting the file handler in if.
  • data = json.load(fh) - Loading the JSON input file using load() method from the json library which will populate the data variable with a Python dictionary.
  • with open("output.json", "w") as of: - Opening the output file in write mode and getting the file handler in of.
  • for record in data: - Loop over the list of records in the JSON.
  • of.write("%s,%s,%s\n" % (record["id"],record["name"],record["s3_path"])) - Fetching the required attributes from each record and writing them in the file.
Samarth
  • 627
  • 6
  • 13
  • Super.. this worked... Can u explain me this... from import statement – akr Oct 31 '18 at 06:52
  • Updated the answer. Let me know if anything is not clear. – Samarth Oct 31 '18 at 07:08
  • Thank u so much :) – akr Nov 02 '18 at 04:04
  • How to redirect the print statement to a file? @Samarth can u help – akr Nov 02 '18 at 06:40
  • 1 simple way is `python fetch_attributes.py > output_file`. But, if you want to do in Python specifically, I can update the answer. Let me know. – Samarth Nov 02 '18 at 07:07
  • my script has many other activities. if i have to go with this i can create a separate python script and go ahead as u said. But i would also like to learn on how to assign the result to a file. – akr Nov 02 '18 at 08:21
  • with open("input.json") as if: ^ SyntaxError: invalid syntax getting error :( – akr Nov 02 '18 at 09:53
  • My bad. `if` is a keyword, was using it as a variable. Updated. :( – Samarth Nov 02 '18 at 09:56
  • Traceback (most recent call last): File "fetch_attributes.py", line 4, in data = json.load(fh) File "/usr/lib64/python2.6/json/__init__.py", line 267, in load parse_constant=parse_constant, **kw) File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.6/json/decoder.py", line 322, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 1 column 18386 - line 1 column 74055 (char 18386 - 74055) so error :| – akr Nov 02 '18 at 10:03
  • The issue is in the input data that you have provided. It does not seem to be in JSON format. The code is fine now. It is running on my system with the input you have given in the question. – Samarth Nov 02 '18 at 10:07