2

I have a project requirement.I am getting the data in text format as given below.

SL NO  POLICY NO  AMOUNT  NAME            CGST TAX
02     33051090   195.0   D BL ESSENTIAL  9.00%
03     33051091   195.1   D HRFL COD      9.00%

But I need to process text content and form json out of that.

[{
"SL NO":"02",
"POLICY NO":"33051090",
"AMOUNT":"195.0",
"NAME":"D BL ESSENTIAL",
"CGST TAX":"9.00%"
},
{
"SL NO":"03",
"POLICY NO":"33051091",
"AMOUNT":"195.1",
"NAME":"D HRFL COD",
"CGST TAX":"9.00%"
}]

I am unable to think of any logic as how to diffrentiate the values and map to json property as there are lot of whitespaces in between.

There is no unique separator between the contents I am getting.So It is not like CSV data.

Lelio Faieta
  • 6,457
  • 7
  • 40
  • 74
  • 1
    Possible duplicate of [Convert CSV data into JSON format using Javascript](https://stackoverflow.com/questions/27979002/convert-csv-data-into-json-format-using-javascript) – Ashen Gunaratne Jun 11 '19 at 12:33
  • https://stackoverflow.com/questions/27979002/convert-csv-data-into-json-format-using-javascript – Ashen Gunaratne Jun 11 '19 at 12:33
  • 1
    @AshenGunaratne The code there depends on an unambiguous delimiter between fields. This file doesn't seem to have that. – Barmar Jun 11 '19 at 12:42
  • 2
    I would go back to the provider of the data and ask for a delimited file... – Heretic Monkey Jun 11 '19 at 12:51
  • I must agree with @HereticMonkey. Get a properly formatted source file to read in, otherwise there is just too much risk with attempting to write code to figure out what is truly meant in the values. Garbage in leads to garbage out. – edjm Jun 11 '19 at 12:55

2 Answers2

2

Since all fields except the name are numeric, you can match them with a regular expression. The name is everything between the amount and tax percentage.

let re = /^(\d+)\s+(\d+)\s+([\d.]+)\s+(.*?)\s+([\d.]+%)$/;
let data = `SL NO POLICY NO AMOUNT NAME CGST TAX
02   33051090  195.0  D BL ESSENTIAL 9.00%
03  33051091  195.1    D HRFL COD  9.00%`;
let obj = [];
data.split('\n').forEach(line => {
  let match = line.match(re);
  if (match) {
    obj.push({
      "SL NO": match[1],
      "POLICY NO": match[2],
      "AMOUNT": match[3],
      "NAME": match[4],
      "CGST TAX": match[5]
    });
  }
});
console.log(obj);

Or instead of depending on the other fields to be numeric, you could just hope that none of them contain any embedded whitespace.

let re = /^(\S+)\s+(\S+)\s+(\S+)\s+(.*?)\s+(\S+)$/;
let data = `SL NO POLICY NO AMOUNT NAME CGST TAX
02   33051090  195.0  D BL ESSENTIAL 9.00%
03  33051091  195.1    D HRFL COD  9.00%`;
let obj = [];
data.split('\n').slice(1).forEach(line => {
  let match = line.match(re);
  if (match) {
    obj.push({
      "SL NO": match[1],
      "POLICY NO": match[2],
      "AMOUNT": match[3],
      "NAME": match[4],
      "CGST TAX": match[5]
    });
  }
});
console.log(obj);

.slice(1) is to skip over the header line.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • I would caution that this works for this specific format for these specific data. Anything falling out of the pattern will cause this to fail; a value of "N/A" in one of the numeric column, for instance. – Heretic Monkey Jun 11 '19 at 12:57
  • Good point. I've added a second version that just assumes those columns don't have any whitespace in them, only the name column can have spaces. – Barmar Jun 11 '19 at 13:04
1

You could solve this with regex, something like (\d+)\s+(\d+)\s+([\d\.]+)\s+([\w\s]+)\s+([\d\.]+\%)

var re = /^(\d+)\s+(\d+)\s+([\d\.]+)\s+([\w\s]+)\s+([\d\.]+\%)$/;
var data = `SL NO POLICY NO AMOUNT NAME CGST TAX
02   33051090  195.0  D BL ESSENTIAL 9.00%
03  33051091  195.1    D HRFL COD  9.00%`;
var result = data.split("\n").slice(1).map(item => {
    var match = item.match(re);
    return {
       "SL NO": match[1],
       "POLICY NO": match[2],
       "AMOUNT": match[3],
       "NAME": match[4],
       "CGST TAX": match[5]
    };
});
console.log(result);

But this is prone to errors - as soon as the format varies slightly this all breaks. I'd echo what others said in the comments - get a better data format which is less ambiguous.

Jamiec
  • 133,658
  • 13
  • 134
  • 193