1

The bookmarks file for the Vivaldi browser (based on Chromium) tends to accumulate a huge number of base64-encoded thumbnails taking up a lot of space, and I would like to remove these entries. The file is a JSON file and an entry looks like this:

{
  "date_added": "13215828073144281",
  "guid": "3ace3174-ea60-42c5-88cf-e535a150ae38",
  "id": "74",
  "meta_info": {
     "Thumbnail": "data:image/jpeg;base64,/9j/4AAQSkZJRgA....AUpSgFKUoBSlKA//2Q=="
  },
  "name": "RIPE WHOIS IP Address Database Search › Look up an IP addres… - iTools",
  "type": "url",
  "url": "http://itools.com/tool/ripe-whois-ip-address"
},

I already have a jq filter looking like this:

jq 'walk(if type == "object" then with_entries(select(.key | test("Thumbnail") | not)) else . end)' Bookmarks > Bookmarks2

The problem is this also deletes entries containing custom thumbnails like this:

"Thumbnail": "chrome://vivaldi-data/local-image/aa0d8713-99c6-4fcb-a725-a29235c4e8b0",

So the question is, how would I remove only the Thumbnail entries containing or starting with the string data:image?

Stian Lund
  • 193
  • 2
  • 12
  • I'm not very experienced with JQ (or JSON for that matter), so apologies if I'm not using the correct terms above. I also wasn't able to find another answer exactly matching this specific case. – Stian Lund Oct 16 '21 at 12:56
  • I don't know much `jq` either but a workaround would be to use `jq` just to pretty print the json, then use `sed` to remove the lines containing `"Thumbnail": "data:image`? That might still be valid json? – SamBob Oct 16 '21 at 13:05
  • @SamBob Yes, `sed` would be good, but for some reason it seems to break the JSON, and clears everything until it gets a new copy from the Sync server. While using the `jq` tool works, not sure why. – Stian Lund Oct 16 '21 at 13:10

2 Answers2

2

Something like this should do the trick:

del(recurse | objects | select(has("Thumbnail")) .Thumbnail | select(startswith("data:image")))
oguz ismail
  • 1
  • 16
  • 47
  • 69
1

You could add another constraint startswith("data:image") | not and select to keep only the elements whose .key does not match or whose .value does not start that way, resulting in: select((.key | test("Thumbnail") | not) or (.value | startswith("data:image") | not)). You could even apply De Morgan's laws and simplify it to select(((.key | test("Thumbnail")) and (.value | startswith("data:image"))) | not).

However, there's a simpler approach: Assuming the overall structure is an array along the lines of

[
  {
    "date_added": "13215828073144281",
    "guid": "3ace3174-ea60-42c5-88cf-e535a150ae38",
    ...
  },
  {
    "date_added": "13215828073144282",
    "guid": "3ace3174-ea60-42c5-88cf-e535a150ae39",
    ...
  },
  ...
]

Then simply call

jq 'map(del(.meta_info.Thumbnail | select(startswith("data:image"))))' Bookmarks
pmf
  • 24,478
  • 2
  • 22
  • 31
  • Thanks - I tried the simplified approach, but jq just dumps core, I get that a lot when testing stuff and not sure if it's Cygwin or something else. – Stian Lund Oct 16 '21 at 14:07
  • The others work but all those parentheses get confusing ;) `jq 'walk(if type == "object" then with_entries(select((.key | test("Thumbnail") | not) or (.value | startswith("data:image") | not))) else . end)' Bookmarks` – Stian Lund Oct 16 '21 at 14:14
  • Maybe your overall structure isn't a simple array, then. Unfortunately, I don't know vivaldi's export data structures, I just tried to take out the complexity of your `walk` and just apply a `map`. If the structure is, however, complex enough, then go with @oguz's approach using `recurse` which to some extent is similar to your `walk` but then utilizes the same technique as I do (`select` and `del`). – pmf Oct 16 '21 at 14:19
  • Yeah, it's a complex structure, several layers deep. For an example (in case you're interested), see: https://pastebin.com/S3r62qjk . I added the NRK bookmark there with a thumbnail. – Stian Lund Oct 16 '21 at 14:29