-1

I am trying to reformat a json file and eliminate a good portion of the file. Here is the original json file.

       "2597401":[{"jobID":"2597401",
                 "account":"TG-CCR120014",
                 "user":"charngda",
                 "pkgT":{"pgi/7.2-  5":{"libA":["libpgc.so"],
                 "flavor":["default"]}},          
                 "startEpoch":"1338497979",
                 "runTime":"1022",
                 "execType":"user:binary",              
                 "exec":"ft.D.64",
                 "numNodes":"4",
                 "sha1":"5a79879235aa31b6a46e73b43879428e2a175db5",
                 "execEpoch":1336766742,
                 "execModify":"Fri May 11 15:05:42 2012",
                 "startTime":"Thu May 31 15:59:39 2012",
                 "numCores":"64",
                 "sizeT":{"bss":"1881400168","text":"239574","data":"22504"}},  
                 {"jobID":"2597401",
                 "account":"TG-CCR120014",
                 "user":"charngda",
                 "pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"],
                 "flavor":["default"]}},
                 "startEpoch":"1338497946",
                 "runTime":"33"  "execType":"user:binary",
                 "exec":"cg.C.64",
                 "numNodes":"4",
                 "sha1":"caf415e011e28b7e4e5b050fb61cbf71a62a9789",
                 "execEpoch":1336766735,
                "execModify":"Fri May 11 15:05:35 2012",
                "startTime":"Thu May 31 15:59:06 2012",
                "numCores":"64",
                "sizeT":{"bss":"29630984","text":"225749","data":"20360"}},
                {"jobID":"2597401",
                "account":"TG-CCR120014",
                "user":"charngda",
                "pkgT":{"pgi/7.2-5":  {"libA":["libpgc.so"],
                "flavor":["default"]}},
                "startEpoch":"1338500447",
                "runTime":"145",
                "execType":"user:binary",
                "exec":"mg.D.64",
                "numNodes":"4",
                "sha1":"173de32e1514ad097b1c051ec49c4eb240f2001f",
                "execEpoch":1336766756,
                "execModify":"Fri May 11 15:05:56 2012",
                "startTime":"Thu May 31 16:40:47 2012",
                "numCores":"64",
                "sizeT":{"bss":"456954120","text":"426186","data":"22184"}},{"jobID":"2597401",
                "account":"TG-CCR120014",
                "user":"charngda",
                "pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"],
                "flavor":["default"]}},
                "startEpoch":"1338499002",
                "runTime":"1444",
                "execType":"user:binary",
                "exec":"lu.D.64",
                "numNodes":"4",
                "sha1":"c6dc16d25c2f23d2a3321d4feed16ab7e10c2cc1",
                "execEpoch":1336766748,
                "execModify":"Fri May 11 15:05:48 2012",
                "startTime":"Thu May 31 16:16:42 2012",
                "numCores":"64",
                "sizeT":{"bss":"199850984","text":"474218","data":"27064"}}],

For each JobId I only want to keep the "exec" field and the JobID. How can I construct a regex to dumb the rest of the data? Ideally, I want the following: JobID exec1 exec2 exec3
Is there some way to do this?

Thanks in advance.

amber4478
  • 6,433
  • 3
  • 20
  • 17
  • You mean `{"2597401": [{"JobID": 2597401, "exec": "ft.D.64"}]}` ? – Explosion Pills Apr 08 '13 at 00:22
  • Sort of. THe initial digits are the JobId, so ideally I would want something like this. 2597401 ft.D.64 cg,C,64 mg.D.64 lu.d.64 there are multiple exec for the same job, so I would like the jobID and the exec. – amber4478 Apr 08 '13 at 00:26
  • 4
    Use a JSON library that will read your JSON, let you manipulate it, and save it back out. That JSON library will already have been written, tested and debugged, unlike your code. Regular expressions are not a magic wand that you wave at every problem that happens to involve text. – Andy Lester Apr 08 '13 at 00:26
  • @amber4478 something like what? – Explosion Pills Apr 08 '13 at 00:27
  • 2597401 ft.D.64 cg,C,64 mg.D.64 lu.d.64 – amber4478 Apr 08 '13 at 00:30
  • Can you point me to a JSON library that could help me accomplish this – amber4478 Apr 08 '13 at 00:31
  • @amber4478, that depends on the language platform you are using. – Qtax Apr 08 '13 at 00:37
  • most likely Python because I need to perform permutations on the data so that I can create a DSM showing co-occurrence of exec for each job. – amber4478 Apr 08 '13 at 00:45
  • @amber4478: Googling for "python json" turns up this page that seems to say that there's a JSON library built in to the Python standard library itself. http://docs.python.org/2/library/json.html – Andy Lester Apr 08 '13 at 04:30
  • possible duplicate of [json stringify : How to exclude certain fields from the json string](http://stackoverflow.com/questions/4910567/json-stringify-how-to-exclude-certain-fields-from-the-json-string) – Paul Sweatte Mar 13 '14 at 18:51

1 Answers1

2

Because you did not specify your RegEx engine, I will assume you are using for my answer.

Based on JSON formatting, you can use this RegEx to match unneeded pairs to replace with nothing:

/(,\s*(*SKIP))?+("(?!jobID"|exec)[^"]+"\s*+:\s*+("[^"]*"|{(?2)?+(?>,\s*(?2))*}|\[(?3)?+(?>,\s*(?3))*\]))(?(1)|,?)/g

Here is what you ordered after applying the RegEx replacement:

       "2597401":[{"jobID":"2597401",
                 "execType":"user:binary",              
                 "exec":"ft.D.64",
                 "execEpoch":1336766742,
                 "execModify":"Fri May 11 15:05:42 2012"},  
                 {"jobID":"2597401"  "execType":"user:binary",
                 "exec":"cg.C.64",
                 "execEpoch":1336766735,
                "execModify":"Fri May 11 15:05:35 2012"},
                {"jobID":"2597401",
                "execType":"user:binary",
                "exec":"mg.D.64",
                "execEpoch":1336766756,
                "execModify":"Fri May 11 15:05:56 2012"},{"jobID":"2597401",
                "execType":"user:binary",
                "exec":"lu.D.64",
                "execEpoch":1336766748,
                "execModify":"Fri May 11 15:05:48 2012"}],

As you can see, the resulting string has invalid syntax within '"jobID":"2597401" "execType":"user:binary"', which was a syntax error in your given data...

With explanation:

/(,\s*(*SKIP))?+
# Attempts to match a comma and whitespace,
# without backtracking;
# And if the comma is matched, use (*SKIP) verb,
# which advances the pointer if we fail to match the comma.

# Key - Value pairs not worthy of keeping.
(
  "(?!jobID"|exec)[^"]+" # Check if we like this key.
  \s*+:\s*+ # The colon, advance whitespaces.
  ( # Check keys recursively.
    "[^"]*"
      # String literals, boring.
    | {(?2)?+(?>,\s*(?2))*}
      # Or: An object storing some key-value pairs
      # we don't care about.
    | \[(?3)?+(?>,\s*(?3))*\]
      # Or: An array storing some values
      # we don't care about.
  )
)
(?(1)|,?)
# Balance the comma (so the result string is still valid JSON)
/gx

Here is a regex demo.

CSᵠ
  • 10,049
  • 9
  • 41
  • 64
JavaBot
  • 66
  • 1
  • 1
  • 7