2

I have JSON objects in this format:

 {
     "1f626": {
         "name": "frowning face with open mouth",
         "ascii": [],
         "code_points": {
             "base": "1f626",
             "default_matches": [
                 "1f626"
             ],
             "greedy_matches": [
                 "1f626"
             ],
             "decimal": ""
         }
     }
 }

I have to remove the code_points object using Regular Expressions.


I have tried using this RegEx:

(("code\w+)(.*)(}))

But it is only selecting the first line. I have to select until end of curly brackets in order to fully get rid of the code_points object.

How can I do that?


Note: I have to remove it using Regular Expressions and not JavaScript. Please don't post any JavaScript answers or mark this as a possible duplicate of a JavaScript-based question.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mina
  • 167
  • 1
  • 3
  • 13
  • 3
    Just `delete obj["1f626"]["code_points"]` – KaiserKatze Aug 26 '18 at 04:03
  • @KaiserKatze using javascript? – Mina Aug 26 '18 at 04:05
  • Yes. Just try `delete obj["1f626"]["code_points"]`, with `obj` being the object in your code. – KaiserKatze Aug 26 '18 at 04:09
  • Reference: [1](https://stackoverflow.com/questions/6485127/how-to-delete-unset-the-properties-of-a-javascript-object); [2](https://stackoverflow.com/questions/208105/how-do-i-remove-a-property-from-a-javascript-object); [3](https://stackoverflow.com/questions/1596782/how-to-unset-a-javascript-variable). – KaiserKatze Aug 26 '18 at 04:12
  • I have to remove using `Regular Expression` not JavaScript – Mina Aug 26 '18 at 04:14
  • If so, you are asking for [Lexical analysis](https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization) feature. – KaiserKatze Aug 26 '18 at 04:18
  • 3
    JSON isn't a regular language; it is actually awful to use regex on JSON and it is why we have JSON parsers. I dread to think who is forcing you to use regex :-( – Chris Cousins Aug 26 '18 at 04:18
  • Possible duplicate of [How to implement Lexical Analysis in Javascript](https://stackoverflow.com/questions/4726539/how-to-implement-lexical-analysis-in-javascript). – KaiserKatze Aug 26 '18 at 04:22
  • If you have to do it without javascript, maybe the javascript tag on the question isn't appropriate? – caedmon Aug 26 '18 at 04:23
  • I hope you understand that **JSON encoder/decoder is not written in Regular Expression**. – KaiserKatze Aug 26 '18 at 04:29
  • @Mina You could choose to use a [**JSON encoder/decoder written in C/C++**](https://json.org/). – KaiserKatze Aug 26 '18 at 04:32
  • this is a large JSON file: https://raw.githubusercontent.com/delowar64/emoji-finder/master/src/emoji/emoji.json I have to reduce file size, that's why I want to use Regular expression. @KaiserKatze – Mina Aug 26 '18 at 04:32
  • @Mina I don't understand why you choose not to use JavaScript. Efficiency? – KaiserKatze Aug 26 '18 at 04:34
  • @KaiserKatze if I use javascript method then how can get output value as a new file content? – Mina Aug 26 '18 at 04:40
  • @Mina If you use JavaScript to handle JSON, you should learn to use [**Node.js**](https://nodejs.org/en/), which is a standalone JavaScript Engine. – KaiserKatze Aug 26 '18 at 04:43
  • @Mina You could try to use [**Python**](https://www.python.org) to handle JSON as well. – KaiserKatze Aug 26 '18 at 04:43

2 Answers2

3

Alternatively, at the command-line, if you can use jq

jq "del(.[].code_points)" <monster.json >smaller_monster.json

This deletes the code_points key inside each 2nd-level object.

It took my machine about 5 seconds on a 60MB document.

It's not a regular expression but it's not JavaScript, either. So, it meets half of your non-functional requirements.

Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
1

("code_points")([\s\S]*?)(})

The problem you had is that . is actually any character except \n, so in this case I usually use [\s\S] which means any whitespace and non-whitespace character (so it's actually any character). Also you should make * quantifier to be lazy by adding ?.

Remember that this Regular Expression won't work properly in case you have inner object (other {}) in code_points

CrafterKolyan
  • 1,042
  • 5
  • 13