1

I have JSON content that I want to convert to an array. That JSON content is a HTML table with cell values. Those cells I want to convert into one single array, so I can use that further in my project.

I'm struggeling to convert those cell values into an array.

So what I have is this:

JSON (very small snippet):

{
    "textpage": {
        "content": "<table width=\"512\">\r\n<tbody>\r\n<tr>\r\n<td width=\"64\">8211</td>\r\n<td width=\"64\">8231</td>\r\n<td width=\"64\">1309</td>\r\n<td width=\"64\">1333</td>\r\n<td width=\"64\">1011</td>\r\n<td width=\"64\">1035</td>\r\n<td width=\"64\">1062</td>\r\n<td width=\"64\">1087</td>\r\n</tr>\r\n<tr>\r\n<td>8212</td>\r\n<td>8232</td>\r\n<td>1311</td>\r\n<td>1334</td>\r\n<td>1012</td>\r\n<td>1036</td>\r\n<td>1063</td>\r\n<td>1091</td>\r\n</tr>\r\n<tr>\r\n<td>8218</td>\r\n<td>8233</td>\r\n<td>1312</td>\r\n<td>1335</td>\r\n<td>1013</td>\r\n<td>1037</td>\r\n<td>1064</td>\r\n<td>1092</td>\r\n</tr>\r\n<tr>\r\n<td>8219</td>\r\n<td>8239</td>\r\n<td>1313</td>\r\n<td>1336</td>\r\n<td>1014</td> 
///////// and whole lot more stuff //////////
    },

jQuery:

function getPostalcode(){

 var url = 'link-to-text-page';
 $.getJSON(url+'?format=json', function(data){
   var content = data.textpage.content,
       codes = [];

   $(content + 'tr td').each(function(){
     var code = $(this).text();
     codes.push(code)
   });
  codes.join('');   
  console.log(codes)
 });
}

That console.log returns this:

Array [ "   8211 8231 1309 1333 1011 1035 10…" ]

I need it to read like

Array ["8211,8231,1309,1333,1011,1035,10…" ]

The problem is that those cell values are text. So how can I convert those values in variables/array?

I tried things like:

var code = $(this).text();
parsed = JSON.parse(code)

Or:

var code = $(this).text();
$(code).text(function(i, val) {
  return val.replace(/,/g, ", ");
});

Any help greatly appreciated!

Meules
  • 1,349
  • 4
  • 24
  • 71
  • start by changing `codes.join('');` to `codes.join(',');` then [trim()](https://www.w3schools.com/jsreF/jsref_trim_string.asp) your string – DIEGO CARRASCAL Oct 13 '17 at 13:39

1 Answers1

2

Option 1: Quick hack regex

let html = data.textpage.content;
let values = html.match(/<td[^>]*>([^<]+)/g)
                 .map( s => s.replace(/<.+?>/,'') );
// [ '8211', '8231', '1309', '1333', '1011', '1035', '1062', '1087', … ]

Or, if you wanted an array of numbers:

let values = html.match(/<td[^>]*>([^<]+)/g)
                 .map( s => s.replace(/<.+?>/,'')*1 );
// [ 8211, 8231, 1309, 1333, 1011, 1035, 1062, 1087, … ]

Edit: Here's a more elegant regex solution:

let values = html.match(/[^<>]+(?=<\/td>)/g).map( s => s*1 );
// [ 8211, 8231, 1309, 1333, 1011, 1035, 1062, 1087, … ]

let values = html.match(/[^<>]+(?=<\/td>)/g).map(parseFloat);
// [ 8211, 8231, 1309, 1333, 1011, 1035, 1062, 1087, … ]

Note: Using parseInt instead of parseFloat in the above will give you very wrong answers. Can you figure out why?

Option 2: jQuery HTML parsing and selection

let html = data.textpage.content;
let values = $(html).find('td').map((_,td) => $(td).text()*1 );
// [ 8211, 8231, 1309, 1333, 1011, 1035, 1062, 1087, … ]

Note that that jQuery's map method is different from the JavaScript's Array map method; in particular, the jQuery method provides the index of the value first, while the Array method provides the index second. This is why the jQuery version has (_,td): I'm 'throwing away' the index by assigning it to a variable named _ before getting to the parameter we care about.

Also, in case you are unfamiliar with the lambda syntax x => someCode() or (a,b,c) => someCode() as seen above, note that they are equivalent to function(x){ return someCode() } and function(a,b,c){ return someCode() }.

Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • You're welcome. For more learning, I just edited it to add a shorter regex that gets just the content of each cell on the first pass. Note that the regex is fragile; it will break if your cells have any HTML markup in them. This is why you generally [should not use regular expressions to parse HTML](https://stackoverflow.com/a/1732454/405017), as the famous answer says. :) – Phrogz Oct 13 '17 at 13:58