1

I have a string having multiple choice question and answers as follows:

(1) Capital of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Comilla। Ans (A) Dhaka
(2) Largest city of Bangladesh is-
(A) Mirpur (B) Rangpur (C) Chittagong (D) Comilla। Ans (C) Chittagong
(3) Smallest city of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Meherpur। Ans (B) Rangpur

I need to create json from the above string so that it create separate questions with answers

The final json will be like:

{"questions":
    [
     {
    "options":["Dhaka","Rangpur","Chittagong","Comilla"],
    "body":"Capital of Bangladesh is-",
    "answers":["A"]

    },
    {
     "options":["Mirpur","Rangpur","Chittagong","Comilla"],
    "body":"Capital of Bangladesh is-",
    "answers":["C"]
}   
    ]
}

I tried with

   var result = reader.result.split('\n');
    for (var index = 0; index < result.length; index++) {
      var question = result[index]
      if(question.match("/[(/)]/g")){
        questions.push = question
      }
      else {
        questions.push = question
      }
    }
    console.log(questions)

How can I make it

2 Answers2

3

Have a go with this

We need /u to handle unicode and then .+ instead of \w because of the double bytes

More stuff using Unicode regex

Regular expression \p{L} and \p{N}

const str = `(1) The main language of Bangladesh is-
(ক) বাংলা (খ) ইংরেজি  (C) Hindi (D) French। Ans (ক) বাংলা
(2) Largest city of Bangladesh is-
(A) Mirpur (B) Rangpur (C) Chittagong (D) Comilla। Ans (C) Chittagong
(3) Smallest city of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Meherpur। Ans (B) Rangpur`;

const obj = str.split(/\n/u).reduce((acc,line,i) => { 
  if (i%2===0) acc.questions.push({"body":line.match(/\(.+\) (.*)/u)[1]}); // remove the (X) from the question
  else {
    const curItem = acc.questions[acc.questions.length-1]; // last pushed object
    let [optionStr,answer] = line.split(/। /u);// split on this special character
    // assuming 4 options 
    curItem.options = optionStr
      .match(/\(.+\) (.+) \(.+\) (.+) \(.+\) (.+) \(.+\) (.+)/u)
      .slice(1); // drop the first element from the result (full match)
    answer = answer.match(/\((.+)\)/u)[1]; // just get the letter from the bracket
    curItem.answers = [answer];
  }  
  return acc
},{questions:[]})

console.log(obj)
mplungjan
  • 169,008
  • 28
  • 173
  • 236
2

You could also use a pattern to get the parts of the question and answer in capture groups. Then for the answer parts, you can split on uppercase chars between parenthesis.

The pattern with capture groups:

^\(\d+\) (.+)\n(\([A-Z]\).*?)। Ans \(([A-Z])\)
  • ^ Start of string
  • \(\d+\) Match 1+ digits between parenthesis and a space
  • (.+)\n Capture group 1, match the rest of the line and a newline
  • (\([A-Z]\).*?) Capture group 2, match an uppercase char between parenthesis followed by as least as possible chars
  • । Ans Match literally
  • \(([A-Z])\) Capture an uppercase char between parenthesis in group 3

Regex demo

Or using unicode categories if supported:

^\(\p{Nd}\)\s+(.+)\n(\(\p{L}\).*?)।\s+Ans\s+\((\p{L})\)

Regex demo

The group 1 value in the code is denoted by i[1] etc..

const str = `(1) Capital of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Comilla। Ans (A) Dhaka
(2) Largest city of Bangladesh is-
(A) Mirpur (B) Rangpur (C) Chittagong (D) Comilla। Ans (C) Chittagong
(3) Smallest city of Bangladesh is-
(ক) বাংলা (খ) ইংরেজি । Ans (B) Rangpur`;
const regex = /^\(\p{Nd}+\)\s+(.+)\n(\(\p{L}\).*?)।\s+Ans\s+\((\p{L})\)/gum;

let result = {
  questions: Array.from(str.matchAll(regex)).map(i =>
    ({
      options: i[2].split(/\s*\(\p{L}\)\s*/u).filter(Boolean),
      body: i[1],
      answers: [i[3]]
    })
  )
};

console.log(result);

Or an exmaple using negated character classes [^()]+ to match what is between parenthesis.

const str = `(1) Capital of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Comilla। Ans (A) Dhaka
(2) Largest city of Bangladesh is-
(A) Mirpur (B) Rangpur (C) Chittagong (D) Comilla। Ans (C) Chittagong
(3) Smallest city of Bangladesh is-
(ক) বাংলা (খ) ইংরেজি । Ans (B) Rangpur`;
const regex = /^\([^()]+\)\s+(.+)\n(\([^()]+\).*?)।\s+Ans\s+\(([^()]+)\)/gm;

let result = {
  questions: Array.from(str.matchAll(regex)).map(i =>
    ({
      options: i[2].split(/\s*\([^()]+\)\s*/).filter(Boolean),
      body: i[1],
      answers: [i[3]]
    })
  )
};

console.log(result);
The fourth bird
  • 154,723
  • 16
  • 55
  • 70