1

This question is independent of implementation language.

In a recent interview I was asked to write a JSON parser : The input I was given was something like below:

 {
      'key1': 'value1',
      'key2': [key21, key22]
    }

As simple as it may sound, I was stumped as I did not know how to write the parser (btw I am aware of JSON.parse() methods).

The question was to write your own parser.

It matters little that the above JSON is not in the right format. The parse should throw an error if it is not.

Can someone point me towards some technique I could have used to solve this problem.

runtimeZero
  • 26,466
  • 27
  • 73
  • 126
  • 7
    Well that is not valid JSON – epascarello Nov 09 '15 at 18:13
  • javascript or java? Have you made any attempt to research this yet? – tnw Nov 09 '15 at 18:18
  • 1
    As epascarello points out that is not valid JSON. Please see http://www.json.org/ for valid json. If you just want to parse JSON use a library off the shelf. If you just want to create code to parse, use a BNF to parser code generator using the BNF like syntax on the right of the json.org page. If on the other hand you are doing this as homework, then you should probably work with your TAs to understand what you need to learn. – Daniel Moses Nov 09 '15 at 18:20
  • 1
    I found a simple enough starting point https://raw.githubusercontent.com/douglascrockford/JSON-js/master/json_parse.js thank you all for your input – runtimeZero Nov 11 '15 at 14:55

1 Answers1

4

Basically, you'll first have to scan through your string character by character in order to separate it into tokens.

  • { and } are tokens. Let's call them 'START_OBJECT' and 'END_OBJECT';
  • [ and ] are also tokens. Let's call them 'START_ARRAY' and 'END_ARRAY';
  • : and , are tokens, too. Let's call them 'COLON' and 'COMMA';
  • 'key1', 'key2' and 'value1' are 'STRING_CONSTANT' tokens, whose values are the strings themselves;
  • key21 and key22 are IDENTIFIER tokens, whose values are 'key21' and 'key22'

That initial part of the parsing process, where you break your source into tokens is called 'lexical analisis', and is the first step in the parsing process.

The next step would be the 'syntactical analysis', where you would figure out the actual structure of the source from the sequence of tokens.

Haroldo_OK
  • 6,612
  • 3
  • 43
  • 80