0

I am trying to parse certain span tags from Winmo profiles (e.g., like https://open.winmo.com/open/decision_makers/ca/pasadena/jorge/garcia/489325) which do not have id or class values, i.e.,

<span itemprop="email">j****@***********.com</span>
<div itemscope="" itemprop="address" itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">177 East West Colorado Boulevard</span>
<span itemprop="addressLocality">Pasadena</span>,
<span itemprop="addressRegion">CA</span>
<span itemprop="postalCode">91195</span>
<span itemprop="addressCountry">USA</span>

I found two old StackOverflow examples helpful (this and this), but I am still getting null values for each of the 9 span itemprop-matching lines on the webpage with the following code:

var nodes=[], values=[];
var els = document.getElementsByTagName('span'), i = 0, whatev;
for(i; i < els.length; i++) {
    prop = els[i].getAttribute('itemprop');
    if(prop) {
        whatev = els[i];
        nodes.push(whatev.nodeName); // provides attribute names, in all CAPS = "SPAN"
        values.push(whatev.nodeValue); // for attribute values, why saying null if els[i] is fine?
        console.log(values); // (whatev) outputs whole thing, but it seems values is what I need
       // break; // need this? seems to prevent values after first span from generating
    }
}

How do I return just the partly-hidden email value (j****@***********.com) and the postalCode (91195) from these kinds of pages? I need the solution in plain JS because I will be compressing it into a bookmarklet for others.

2 Answers2

0

You can get the email span via the selector

span[itemprop="email"]

and the postalCode with the same method

span[itemprop="postalCode"]

With those selectors, use querySelector to get to the element, then extract its textContent:

const [email, postalCode] = ['email', 'postalCode'].map(
  val => document.querySelector(`span[itemprop="${val}"]`).textContent
);
console.log(email);
console.log(postalCode);
<span itemprop="email">j****@***********.com</span>
<div itemscope="" itemprop="address" itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">177 East West Colorado Boulevard</span>
<span itemprop="addressLocality">Pasadena</span>,
<span itemprop="addressRegion">CA</span>
<span itemprop="postalCode">91195</span>
<span itemprop="addressCountry">USA</span>
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • Excellent, efficient method! I didn't realize the array map method could be so elegantly applied to these non-standard span attributes/values. – Glenn Gutmacher Mar 24 '20 at 00:59
0

You can grab the assignments from the itemprop attribute.

something like this:

function getItemPropsAsJSON(){
  var ob = {};
  Array.from(document.getElementsByTagName('span')).forEach(el=> {     
    var key = el.getAttribute('itemprop');
    var val = el.innerText;
    if (key && val) ob[key] = val;
  });
  return ob;
}
/* expected output: 
    {
      "name": "Jorge Garcia - Co-Founder & Chief Technology Officer, ICONIC | Contact Information, Email Address, Phone Number, Budgets and Responsibilities",
      "email": "j****@***********.com",
      "telephone": "(347) ***-****",
      "streetAddress": "177 East West Colorado Boulevard",
      "addressLocality": "Pasadena",
      "addressRegion": "CA",
      "postalCode": "91195",
      "addressCountry": "USA"
    }
*/

You may want to normalize the keys if you use this elsewhere, as the itemprop attributes may not always convert to the ideal object notation format. To do that, use the following:

function normalizeObjectNotation(key){
  return key && typeof key == 'string' && /[A-Z]/.test(key) && /\W+/.test(key) == false
  ? key.trim().split(/(?=[A-Z])/).reduce((a,b)=> a+'_'+b).replace(/^\d+/, '').toLowerCase() 
  : key && typeof key == 'string' ? key.trim().replace(/\W+/g, '_').replace(/^\d+/, '').toLowerCase() 
  : 'failed_object';
}

function getItemPropsAsJSON(){
  var ob = {};
  Array.from(document.getElementsByTagName('span')).forEach(el=> {     
    var key = el.getAttribute('itemprop');
    var val = el.innerText;
    if (key && val) ob[normalizeObjectNotation(key)] = val;
  });
  return ob;
}
getItemPropsAsJSON()

/* Expected Output:

{
  "name": "Jorge Garcia - Co-Founder & Chief Technology Officer, ICONIC | Contact Information, Email Address, Phone Number, Budgets and Responsibilities",
  "email": "j****@***********.com",
  "telephone": "(347) ***-****",
  "street_address": "177 East West Colorado Boulevard",
  "address_locality": "Pasadena",
  "address_region": "CA",
  "postal_code": "91195",
  "address_country": "USA"
}

*/
Andre Bradshaw
  • 119
  • 1
  • 5
  • Thanks for the flexible solution using JSON, and the normalizing tip, Andre. This and the other answer provided the same day were both excellent! – Glenn Gutmacher Mar 24 '20 at 01:00