Text processing to fetch the attributes

Question

Find below the input data:

[{"acc_id": 166211981, "archived": true, "access_key": "ALLLJNXXXXXXXPU4C7GA", "secret_key": "X12J6SixMaFHoXXXXZW707XXX24OXXX", "created": "2018-10-03T05:56:01.208069Z", "description": "Data Testing", "id": 11722990697, "key_field": "Ae_Appl_Number", "last_modified": "2018-10-03T08:44:20.324237Z", "list_type": "js_variables", "name": "TEST_AE_LI_KEYS_003", "project_id": 1045199007354, "s3_path": "opti-port/dcp/ue.1045199007354/11722990697"}, {"acc_id": 166211981, "archived": false, "access_key": "ALLLJNXXXXXXXPU4C7GA", "secret_key": "X12J6SixMaFHoXXXXZW707XXX24OXXX", "created": "2018-10-03T08:46:32.535653Z", "description": "Data Testing", "id": 11724290732, "key_field": "Ae_Appl_Number", "last_modified": "2018-10-03T10:11:13.167798Z", "list_type": "js_variables", "name": "TEST_AE_LI_KEYS_001", "project_id": 1045199007354, "s3_path": "opti-port/dcp/ue.1045199007354/11724290732"}]

I want the output file to contain below data:

11722990697,TEST_AE_LI_KEYS_003,opti-port/dcp/ue.1045199007354/11722990697
11724290732,EST_AE_LI_KEYS_001,opti-port/dcp/ue.1045199007354/11724290732

I am able to achieve the same by taking one record at a time and processing it using awk.but i am getting the field names also.

find below my trial:

R=cat in.txt | awk -F '},' '{print $1}'

echo $R | awk -F , '{print $7 " " $11 " " $13}'

I want it to be done for entire file without field names.

Besides the obvious syntax errors in your shell script attempts, you need to understand [how to quote things properly](/q/10067266) (and probably also how to [avoid useless `cat`s](/q/11710552)). Maybe see also http://shellcheck.net/ — tripleee, Oct 31 '18 at 04:53
Maybe see also https://stackoverflow.com/questions/26971987/assignment-of-variables-with-space-after-the-sign — tripleee, Oct 31 '18 at 04:57
Possible duplicate of [Parsing JSON with Unix tools](https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools) — tripleee, Oct 31 '18 at 05:00
As a further aside, capturing things in a variable just so you can pipe to another command is an antipattern; just pipe to that command directly. But piping Awk to Awk is almost always silly; your script can be refactored to `awk -F '},' '{ split($1, a, ","); print a[7], a[11], a[13] }' in.txt` which coincidentally also fixes several of your errors by removing redundant steps where you had errors. — tripleee, Oct 31 '18 at 05:04
i tried this, but i want the same to happen for all records. Also i dont want the attribute name, i want only the values.pls check the output file in the question. — akr, Oct 31 '18 at 06:25

Manish R · Answer 1 · 2018-10-31T07:03:25.680

2

AWK/SED is not the right tool for parsing JSON files. Use jq

[root@localhost]# jq -r '.[] | "\(.acc_id),\(.name),\(.s3_path)"' abc.json
166211981,TEST_AE_LI_KEYS_003,opti-port/dcp/ue.1045199007354/11722990697
166211981,TEST_AE_LI_KEYS_001,opti-port/dcp/ue.1045199007354/11724290732

If you don't want to install any other software then you can use python as well which is found on most of the linux machine

[root@localhost]# cat parse_json.py
#!/usr/bin/env python

# Import the json module 
import json

# Open the json file in read only mode and load the json data. It will load the data in python dictionary
with open('abc.json') as fh:
        data = json.load(fh)

# To print the dictionary
# print(data)

# To print the name key from first and second record
# print(data[0]["name"])
# print(data[1]["name"]) 
# Now to get both the records use a for loop
for i in range(0,2):
        print("%s,%s,%s") % (data[i]["access_key"],data[i]["name"],data[i]["s3_path"])
[root@localhost]# ./parse_json.py
ALLLJNXXXXXXXPU4C7GA,TEST_AE_LI_KEYS_003,opti-port/dcp/ue.1045199007354/11722990697
ALLLJNXXXXXXXPU4C7GA,TEST_AE_LI_KEYS_001,opti-port/dcp/ue.1045199007354/11724290732

edited Oct 31 '18 at 07:03

answered Oct 31 '18 at 06:16

Manish R

2,312
17
13

Follow this link to install jq or you can just copy the binary and use it. https://stedolan.github.io/jq/download/ – Manish R Oct 31 '18 at 06:24
Sorry, i m not sure if its allowed to install these in production servers. i may need lot of approvals... is there any other way to achieve... – akr Oct 31 '18 at 06:29
Like I mentioned you can just copy the binary and use it. – Manish R Oct 31 '18 at 06:30
is this for only 2 records, wat if i have n records and i do not know the value of n? – akr Nov 02 '18 at 04:02
Change the value of range in for loop accordingly – Manish R Nov 02 '18 at 04:11
i will not know how many records will be thr in the file.. den how to specify? – akr Nov 02 '18 at 06:05

Samarth · Accepted Answer · 2018-11-02T09:55:35.950

0

Assuming the input data is in a file called input.json, you can use a Python script to fetch the attributes. Put the following content in a file called fetch_attributes.py:

import json

with open("input.json") as fh:
    data = json.load(fh)
with open("output.json", "w") as of:
    for record in data:
        of.write("%s,%s,%s\n" % (record["id"],record["name"],record["s3_path"]))

Then, run the script as:

python fetch_attributes.py

Code Explanation

import json - Importing Python's json library to parse the JSON.
with open("input.json") as fh: - Opening the input file and getting the file handler in if.
data = json.load(fh) - Loading the JSON input file using load() method from the json library which will populate the data variable with a Python dictionary.
with open("output.json", "w") as of: - Opening the output file in write mode and getting the file handler in of.
for record in data: - Loop over the list of records in the JSON.
of.write("%s,%s,%s\n" % (record["id"],record["name"],record["s3_path"])) - Fetching the required attributes from each record and writing them in the file.

edited Nov 02 '18 at 09:55

answered Oct 31 '18 at 06:48

Samarth

627
6
13

Super.. this worked... Can u explain me this... from import statement – akr Oct 31 '18 at 06:52
Updated the answer. Let me know if anything is not clear. – Samarth Oct 31 '18 at 07:08
Thank u so much :) – akr Nov 02 '18 at 04:04
How to redirect the print statement to a file? @Samarth can u help – akr Nov 02 '18 at 06:40
1 simple way is `python fetch_attributes.py > output_file`. But, if you want to do in Python specifically, I can update the answer. Let me know. – Samarth Nov 02 '18 at 07:07
my script has many other activities. if i have to go with this i can create a separate python script and go ahead as u said. But i would also like to learn on how to assign the result to a file. – akr Nov 02 '18 at 08:21
with open("input.json") as if: ^ SyntaxError: invalid syntax getting error :( – akr Nov 02 '18 at 09:53
My bad. `if` is a keyword, was using it as a variable. Updated. :( – Samarth Nov 02 '18 at 09:56
Traceback (most recent call last): File "fetch_attributes.py", line 4, in data = json.load(fh) File "/usr/lib64/python2.6/json/__init__.py", line 267, in load parse_constant=parse_constant, **kw) File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.6/json/decoder.py", line 322, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 1 column 18386 - line 1 column 74055 (char 18386 - 74055) so error :| – akr Nov 02 '18 at 10:03
The issue is in the input data that you have provided. It does not seem to be in JSON format. The code is fine now. It is running on my system with the input you have given in the question. – Samarth Nov 02 '18 at 10:07

Text processing to fetch the attributes

2 Answers2