0

I'm running OSX. What command line tool could I use for this? I've got a large text file with this JSON output. I'm looking for a way to strip out only those emails without a last_login_date, where I'm not interested in the record without one. Here's the output:

{
        "_id" : ObjectId("52fba903e4b0aa6226e0ce26"),
        "email" : "bar@foo.com"
}
{
        "_id" : ObjectId("521ca254e4b0d28eb6a07f26"),
        "email" : "foo@bar.com",
        "last_login_date" : ISODate("2017-04-10T14:27:03.212Z")
}

Is sed or awk a candidate for this? If so, can you show me how strip out from the file:

{
        "_id" : ObjectId("52fba903e4b0aa6226e0ce26"),
        "email" : "bar@foo.com"
}
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
noober
  • 1,427
  • 3
  • 23
  • 36
  • [`jq`](https://stedolan.github.io/jq/) is a great CLI for parsing JSON, but note that your sample input is _not_ valid JSON. – mklement0 Apr 10 '17 at 17:22
  • unfortunately, that's the file output i have to work with..just seeing if there's a way to clean this. thanks. – noober Apr 10 '17 at 17:24
  • Possible duplicate of [Parsing JSON with Unix tools](http://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools) – tripleee Apr 10 '17 at 17:47

2 Answers2

1

If the records are exactly how you describe them, then you can use:

grep last_login_date -B 3 -A 1 yourFile.json > out.json

Basically grepping for what you interested in and keeping 3 lines before the pattern and 1 line after.

neric
  • 3,927
  • 3
  • 21
  • 25
1

If the input were proper JSON, using third-party CLI jq would be the right tool - see bottom.
Given that it is not, regular text-processing utilities must be used.

neric's answer works with the BSD grep that comes with macOS, but relies on a very specific file layout.

awk allows for a more flexible solution (still assumes that the JSON objects in the input aren't nested, however):

awk -v RS='{' '/"last_login_date"/ { print RS $0 }' file
  • -v RS='{' sets RS, the input record separator, to {, which means that entire JSON-like objects are read one at a time (without the leading {).

  • Regex-matching pattern /"last_login_date"/ looks for substring "last_login_date" inside each record and only executes the associated action ({...}) if found.

  • print "{" $0 } simply prints matching records with the leading { re-added.


If the input were proper JSON, using jq would make the processing both more robust and succinct:

jq 'select(.last_login_date)' file

The above simply selects (filters in) only those JSON objects in the input file that have a last_login_date property (whose value isn't Boolean false).

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775