0

I need to pull values from a JSON object that's within a script tag in an HTML file. The HTML is actually an email (.eml) file.

I am using node's "fs" module to read the file and that works fine. And, generally, I know how to select HTML elements (using document.getElementById, innerHTML, etc) and how to work my way through JSON object hierarchies to select values (using JSON.parse and dot notation, etc). But, I'm not sure how to go about selecting values from within code like this.

X-Account-Key: account31
X-UIDL: 00001b5f073425
X-Mozilla-Status: 0000
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
... more email header info ...
<html lang=3D"en-US"> <head> </head> <body> <div>  <script data-scope=3D"in=
boxmarkup" type=3D"application/json">{
  "api_version": "1.0",
  "publisher": {
    "api_key": "67892787u2cfedea31b225240gg3423t9",
    "name": "Google Alerts"
  },
  "cards": [ {
    "title": "Google Alert - \"search keywords\"",
    "subtitle": "Highlights from the latest email",
    "actions":
... and so on with JSON object, then closing script tag...
... email body wrapped in DIV tag ...

What if I want to grab publisher.name or any other property's value from this code?

Any and all pointers appreciated.

david
  • 243
  • 3
  • 17
  • What DOM library are you using with Node.js? – T.J. Crowder Jan 19 '23 at 08:14
  • I'm not familiar with Node DOM libraries. I'm just using the Node File System (FS) module to read the file as 'utf-8' text. – david Jan 19 '23 at 08:17
  • Ah, okay, I misunderstood. Are you doing *anything* with the file data after reading it? It's not straight HTML, you need to parse it from its MIME encoding (all those `3D`s and `=` tell us it's an email file, which needs parsing). It's been at least 10 years since I parsed an email file, and I don't think I ever did in Node.js, so that step I wouldn't be able to help with. But from there, you can use any of several DOM parsers (see [this question's answers](https://stackoverflow.com/questions/11398419/), then follow the steps in my answer below. – T.J. Crowder Jan 19 '23 at 08:22
  • Ok. Thanks. I was just looking at Node packages like https://www.npmjs.com/package/jsdom and https://www.npmjs.com/package/dom-parser. I'll search around if there's anything specific to parsing emails as well. – david Jan 19 '23 at 08:32
  • Just found https://nodemailer.com/extras/mailparser/, which might work for parsing the email code. It's late, so I'll try that tomorrow. Again, thanks for pointing me in the right direction. – david Jan 19 '23 at 08:47

2 Answers2

2

You'll need to do these steps:

  1. Read the email file (you're already doing that)
  2. Parse the email file and get the HTML body from it
  3. Parse the DOM defined by that HTML
  4. Select the script element
  5. Get its text content
  6. Parse it via JSON.parse
  7. Access the property from the resulting object

You're already reading the file, but just for completeness, here's an example reading it via the fs/promises module's readFile:

import fs from "fs/promises";
//...
const mailText = await fs.readFile("./test.eml");

Then we need to parse it. As you mentioned in a comment, there's a mailparser npm module that does just that:

import { simpleParser } from "mailparser";
// ...
const email = await simpleParser(mailText);

Then we need to get the HTML body and parse it. There are several DOM parsers for Node.js; here I'm using jsdom:

import { JSDOM } from "jsdom";
// ...
const dom = new JSDOM(email.html);

Then we can use querySelector on dom.window.document to select the script element:

const script = dom.window.document.querySelector("script[type='application/json']");

If there are several, you may need to add more attributes to narrow it down, for instance:

const script = dom.window.document.querySelector("script[type='application/json'][data-scope='data-scope='inboxmarkup']");

Once you have the script element, you can access its text content via .textContent.

Once you have the text, you can parse it with JSON.parse.

Once you have the object, obj.publisher.name should give you the value you're looking for.

So:

import fs from "fs/promises";
import { simpleParser } from "mailparser";
import { JSDOM } from "jsdom";

const mailText = await fs.readFile(/*...your email file name...*/);
const email = await simpleParser(mailText);
const dom = new JSDOM(email.html);
const script = dom.window.document.querySelector("script[type='application/json']");
const json = script.textContent;
const obj = JSON.parse(json);
const name = obj.publisher.name;
console.log(name); // "Google Alerts"
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • Wow. That's super helpful, and specific, and an impressively fast reply. Thanks so much! I guess I just needed to break it down into steps as you have. I'd been trying to figure out how to directly select the object and properties. (There's just one script at the top of the email, btw.) – david Jan 19 '23 at 08:23
  • Thanks! But I didn't quite understand the question correctly, so there's at least one step (parsing the DOM) missing, and possibly two (parsing the email file). I've been called away from my desk, but I asked a question via a comment on the question. I'll take a look when I come back. – T.J. Crowder Jan 19 '23 at 08:23
  • Also, I'm not sure if the `3D`s in the code will interfere. I don't know why google puts that in their email code. – david Jan 19 '23 at 08:26
  • @david - Yeah, those 3D things are part of the mail encoding. The mail parser you've found should sort those out. Happy coding! – T.J. Crowder Jan 19 '23 at 08:48
  • @david - I've updated the answer to show mail and DOM parsing and a complete Node.js program. (I copied your HTML into an email file and used that to check it.) Happy coding! – T.J. Crowder Jan 19 '23 at 09:27
0

This is a supplementary answer, built on that of @t-j-crowder. It's what I was shooting for, pulling key data from a nice and neat object in a Google Alert .eml file, rather than scraping the messy HTML of the email itself. If there's already an object in there, why not make use of it?

Check out the "OUTPUT" comment at the end of the JS below to really see what I was going for with this.

If you want to test it yourself, save both the javascript and the example email code below to separate files. And you'll need to install two NPM packages: mailparser and JSDOM.

import fs from 'fs/promises';
import path from 'path';
import { fileURLToPath } from 'url';
import { simpleParser } from 'mailparser';
import { JSDOM } from 'jsdom';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const alertInfoArr = [];

const mailText = await fs.readFile(
  `${__dirname}/GoogleAlert-chatgtp+kenya_2023-01-20.eml`
);
const email = await simpleParser(mailText);
const dom = new JSDOM(email.html);
const script = dom.window.document.querySelector(
  "script[type='application/json']"
);
const json = script.textContent;
const obj = JSON.parse(json);
const alertKey = obj.entity.external_key;
const targetKey = alertKey.replace('Google Alert - ', '').replaceAll('"', '');
const alertDate = obj.entity.subtitle;
const targetDate = alertDate.replace('Latest: ', '');
const urlsParent = obj.cards[0].widgets;
await urlsParent.map((obj) => {
  const targetTitle = obj.title;
  const targetDescription = obj.description;
  const redirectURL = obj.url;
  const urlParam = new URL(redirectURL).searchParams;
  const targetURL = urlParam.get('url');
  const newObject = {
    key: `${targetKey}`,
    title: `${targetTitle}`,
    description: `${targetDescription}`,
    url: `${targetURL}`,
    date: `${targetDate}`,
  };
  alertInfoArr.push(newObject);
});
console.log(alertInfoArr);

/*
OUTPUT:
[
  {
    key: 'chatgtp + kenya',
    title: 'Mentally scarred: Kenyan workers taught ChatGPT to recognize offensive text - The Register',
    description: 'OpenAI reportedly hired workers in Kenya – screening tens of thousands of text samples for sexist, racist, violent and pornographic content – to ...',
    url: 'https://www.theregister.com/2023/01/20/kenyan_workers_chatgpt/',
    date: 'January 21, 2023'
  },
  {
    key: 'chatgtp + kenya',
    title: 'Unethical outsourcing: ChatGPT uses Kenyan workers for traumatic moderation - The Brussels Times',
    description: 'Unethical outsourcing: ChatGPT uses Kenyan workers for traumatic moderation. Credit: The Brussels Times. The artificial intelligence (AI) ...',
    url: 'https://www.brusselstimes.com/business/355283/unethical-outsourcing-chatgpt-uses-kenyan-workers-for-traumatic-moderation',
    date: 'January 21, 2023'
  }
]
*/

Example Google Alert code:

X-Account-Key: account11
X-UIDL: UID6723-1602672813
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:                                                                                 
Return-Path: <3sG7LYxQKBt8HPPFDOS82971PMJDFGJMFT.PSH@alerts.bounces.google.com>
Delivered-To: email@email.com
Received: from ema.email.com
    by ema.email.com with LMTP
    id WEWiHN9uy2PWEAAAMqKFlg
    (envelope-from <3sG7LYxQKBt8HPPFDOS82971PMJDFGJMFT.PSH@alerts.bounces.google.com>)
    for <email@email.com>; Sat, 21 Jan 2023 04:49:35 +0000
Return-path: <3sG7LYxQKBt8HPPFPDOS82971PMJDFGJMFT.PSH@alerts.bounces.google.com>
Envelope-to: google@email.com
Delivery-date: Sat, 21 Jan 2023 04:49:35 +0000
Received: from mail-yb1-f199.google.com ([209.85.219.199]:54213)
    by ema.email.com with esmtps  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    (Exim 4.94.2)
    (envelope-from <3sG7lYxQKBt8HPPFPPDPOS82971PMJDFGJMFT.PSH@alerts.bounces.google.com>)
    id 1pJ5on-00016q-Lv
    for google@email.com; Sat, 21 Jan 2023 04:49:35 +0000
Received: by mail-yb1-f199.google.com with SMTP id a4-20020a5b0004008800b006fdc6aaec4fso7354172ybp.20
        for <google@email.com>; Fri, 20 Jan 2023 20:49:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=to:from:subject:message-id:list-unsubscribe:list-id:date
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=wlbb4h1OkKGMEGEHyfSp/gOrY346qC9WPsNFLv7aoDA=;
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=to:from:subject:message-id:list-unsubscribe:list-id:date
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=wlbb4h1OkKGMEGEHyfSp/gOrY346qC9WPsNFLv7aoDA=;
X-Gm-Message-State: AFqh2kol4r/6gHBIlaMH2MFhzhXz5s7Abaw3vI8srl50X2GjsiTwk5c+
    CzWDFrWOIPE=
X-Google-Smtp-Source: AMrXdXspIcFsq82rJ65AFyIIPUkY3GzreaIQgx8qoU7HItw+z4fWV9Yrbd/77PIoAH2/gmr+ZP4=
MIME-Version: 1.0
X-Received: by 2002:a81:7c88:0:b0:4eb:2b95:a29e with SMTP id
 x130-20020a817c880078200b004eb2b95a29emr2069504ywc.241.1674249828593; Fri, 20
 Jan 2023 20:48:48 -0800 (PST)
Date: Fri, 20 Jan 2023 20:48:48 -0800
List-Id: <12791515946235186142.alerts.google.com>
List-Unsubscribe: <mailto:ur@unsubscribe.alerts.google.com?subject=AB2Xq4h5F1WWKAFWbW6o-Oo5IIJup1CEAsz2RPc>
Message-ID: <000000000000be1c8805f2bee1bf@google.com>
Subject: Google Alert - chatgtp + kenya
From: Google Alerts <googlealerts-noreply@google.com>
To: google@email.com
Content-Type: multipart/alternative; boundary="000000000000be1c6035f2beef1b2"
X-Spam-Status: No, score=-7.7
X-Spam-Score: -76
X-Spam-Bar: -------
X-Ham-Report: Spam detection software, running on the system "ema.email.com",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 root\@localhost for details.
 Content preview:  === News - 2 new results for [chatgtp + kenya] === Mentally
    scarred: Kenyan workers taught ChatGPT to recognize offensive text - The
   Register The Register OpenAI reportedly hired workers in Kenya – screening
    tens of thousands of text samples for sex [...] 
 Content analysis details:   (-7.7 points, 5.0 required)
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
  0.0 URIBL_BLOCKED          ADMINISTRATOR NOTICE: The query to URIBL was
                             blocked.  See
                             http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
                              for more information.
                             [URIs: brusselstimes.com]
 -7.5 USER_IN_DEF_DKIM_WL    From: address is in the default DKIM
                             welcome-list
 -0.0 SPF_PASS               SPF: sender matches SPF record
  0.0 HTML_MESSAGE           BODY: HTML included in message
 -0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature
  0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily
                             valid
 -0.1 DKIM_VALID_EF          Message has a valid DKIM or DK signature from
                             envelope-from domain
 -0.1 DKIM_VALID_AU          Message has a valid DKIM or DK signature from
                             author's domain
X-Spam-Flag: NO

--000000000000be1c6035f2beef1b2
Content-Type: text/plain; charset="UTF-8"; format=flowed; delsp=yes
Content-Transfer-Encoding: base64

PT09IE5ld3MgLSAyIG5ldyByZXN1bHRzIGZvciBbY2hhdGd0cCArIGtlbnlhXSA9PT0NCg0KTWVu
dGFsbHkgc2NhcnJlZDogS2VueWFuIHdvcmtlcnMgdGF1Z2h0IENoYXRHUFQgdG8gcmVjb2duaXpl
IG9mZmVuc2l2ZSB0ZXh0DQotIFRoZSBSZWdpc3Rlcg0KVGhlIFJlZ2lzdGVyDQpPcGVuQUkgcmVw
b3J0ZWRseSBoaXJlZCB3b3JrZXJzIGluIEtlbnlhIOKAkyBzY3JlZW5pbmcgdGVucyBvZiB0aG91
c2FuZHMgb2YNCnRleHQgc2FtcGxlcyBmb3Igc2V4aXN0LCByYWNpc3QsIHZpb2xlbnQgYW5kIHBv
cm5vZ3JhcGhpYyBjb250ZW50IOKAkyB0byAuLi4NCjxodHRwczovL3d3dy5nb29nbGUuY29tL3Vy
bD9yY3Q9aiZzYT10JnVybD1odHRwczovL=
--000000000000be1c6035f2beef1b2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<html lang=3D"en-US"> <head> </head> <body> <div>  <script data-scope=3D"in=
boxmarkup" type=3D"application/json">{
  "api_version": "1.0",
  "publisher": {
    "api_key": "668269e72cfedea31b22524041ff21d9",
    "name": "Google Alerts"
  },
  "entity": {
    "external_key": "Google Alert - chatgtp + kenya",
    "title": "Google Alert - chatgtp + kenya",
    "subtitle": "Latest: January 21, 2023",
    "avatar_image_url": "https://www.gstatic.com/images/branding/product/1x=
/gsa_512dp.png",
    "main_image_url": "https://www.gstatic.com/bt/C3341AA7A1A076756462EE2E5=
CD71C11/smartmail/mobile/il_newspaper_header_r1.png"
  },
  "updates": {
    "snippets": [ {
      "icon": "BOOKMARK",
      "message": "Mentally scarred: Kenyan workers taught ChatGPT to recogn=
ize offensive text - The Register"
    }, {
      "icon": "BOOKMARK",
      "message": "Unethical outsourcing: ChatGPT uses Kenyan workers for tr=
aumatic moderation - The Brussels Times"
    } ]
  },
  "cards": [ {
    "title": "Google Alert - chatgtp + kenya",
    "subtitle": "Highlights from the latest email",
    "actions": [ {
      "name": "See more results",
      "url": "https://www.google.com/alerts"
    } ],
    "widgets": [ {
      "type": "LINK",
      "title": "Mentally scarred: Kenyan workers taught ChatGPT to recogniz=
e offensive text - The Register",
      "description": "OpenAI reportedly hired workers in Kenya =E2=80=93 sc=
reening tens of thousands of text samples for sexist, racist, violent and p=
ornographic content =E2=80=93 to ...",
      "url": "https://www.google.com/url?rct=3Dj\u0026sa=3Dt\u0026url=3Dhtt=
ps://www.theregister.com/2023/01/20/kenyan_workers_chatgpt/\u0026ct=3Dga\u0=
026cd=3DCAEYACoUMTI3NDQ4MjEyNzcxODk4MzI4ODIyGjlmZTE1ZTNiYzdlMDE5MGM6Y29tOmV=
uOlVT\u0026usg=3DAOvVaw2yLGqNbNV5mcqGgXZhgz1S"
    }, {
      "type": "LINK",
      "title": "Unethical outsourcing: ChatGPT uses Kenyan workers for trau=
matic moderation - The Brussels Times",
      "description": "Unethical outsourcing: ChatGPT uses Kenyan workers fo=
r traumatic moderation. Credit: The Brussels Times. The artificial intellig=
ence (AI) ...",
      "url": "https://www.google.com/url?rct=3Dj\u0026sa=3Dt\u0026url=3Dhtt=
ps://www.brusselstimes.com/business/355283/unethical-outsourcing-chatgpt-us=
es-kenyan-workers-for-traumatic-moderation\u0026ct=3Dga\u0026cd=3DCAEYASoUM=
TI3NDQ4MjEyNzcxODk4MzI4ODIyGjlmZTE1ZTNiYzdlMDE5MGM6Y29tOmVuOlVT\u0026usg=3D=
AOvVaw1vnDYspAyAx44Qw2AVhZCG"
    } ]
  } ]
}
</script> <!--[if mso]>
 <table><tr><td width=3D650>
<![endif]-->
 <div style=3D"width:100%;max-width:650px"> <div style=3D"font-family:Arial=
"> <table style=3D"border-collapse:collapse;border-left:1px solid #e4e4e4;b=
order-right:1px solid #e4e4e4"> <tr> <td style=3D"background-color:#f8f8f8;=
padding-left:18px;border-bottom:1px solid #e4e4e4;border-top:1px solid #e4e=
4e4"></td> <td valign=3D"middle" style=3D"padding:13px 10px 8px 0px;backgro=
und-color:#f8f8f8;border-top:1px solid #e4e4e4;border-bottom:1px solid #e4e=
4e4"> <a href=3D"https://www.google.com/alerts?source=3Dalertsmail&amp;hl=
=3Den&amp;gl=3DUS&amp;msgid=3DMTI3PQR4MjEyNzcxODk4zM19I4ODI" style=3D"text-de=
coration:none"> <img src=3D"https://www.google.com/intl/en_us/alerts/logo.p=
ng?cd=3DKhQxMjc0NDgyMTI3NzE4OTgzMjg4Mg" alt=3D"Google" border=3D"0" height=
=3D"25"> </a> </td> <td style=3D"background-color:#f8f8f8;padding-left:18px=
;border-top:1px solid #e4e4e4;border-bottom:1px solid #e4e4e4"></td> </tr> =
 <tr> <td style=3D"padding-left:32px"></td> <td style=3D"padding:18px 0px 0=
px 0px;vertical-align:middle;line-height:20px;font-family:Arial"> <span sty=
le=3D"color:#262626;font-size:22px">chatgtp + kenya</span> <div style=3D"ve=
rtical-align:top;padding-top:6px;color:#aaa;font-size:12px;line-height:16px=
"> <span>As-it-happens update</span> <span style=3D"padding:0px 4px 0px 4px=
">&sdot;</span> <a style=3D"color:#aaa;text-decoration:none">January 21, 20=
23</a> </div> </td> <td style=3D"padding-left:32px"></td> </tr>  <tr> <td s=
tyle=3D"padding-left:18px"></td> <td style=3D"padding:16px 0px 12px 0px;bor=
der-bottom:1px solid #e4e4e4"> <span style=3D"font-size:12px;color:#737373"=
> NEWS </span> </td> <td style=3D"padding-right:18px"></td> </tr>   <tr ite=
mscope=3D"" itemtype=3D"http://schema.org/Article"> <td style=3D"padding-le=
ft:18px"></td> <td style=3D"padding:18px 0px 12px 0px;vertical-align:top;fo=
nt-family:Arial"> <a></a> <div>  <span style=3D"padding:0px 6px 0px 0px"> <=
a href=3D"https://www.google.com/url?rct=3Dj&amp;sa=3Dt&amp;url=3Dhttps://w=
ww.theregister.com/2023/01/20/kenyan_workers_chatgpt/&amp;ct=3Dga&amp;cd=3D=
CAEYACoUMTI3NDQ4MjEyNzcxODk4MzI4ODIyGjlmZTE1ZTNiYzdlMDE5MGM6Y29tOmVuOlVT&am=
p;usg=3DAOvVaw2yLGqNbNV5mcqGgXZhgz1S" itemprop=3D"url" style=3D"color:#427f=
ed;display:inline;text-decoration:none;font-size:16px;line-height:20px"> <s=
pan itemprop=3D"name">Mentally scarred: <b>Kenyan</b> workers taught <b>Cha=
tGPT</b> to recognize offensive text - The Register</span> </a> </span>  <d=
iv> <div style=3D"padding:2px 0px 8px 0px"> <div itemprop=3D"publisher" ite=
mscope=3D"" itemtype=3D"http://schema.org/Organization" style=3D"color:#737=
373;font-size:12px"> <a style=3D"text-decoration:none;color:#737373"> <span=
 itemprop=3D"name">The Register</span> </a> </div> <div itemprop=3D"descrip=
tion" style=3D"color:#252525;padding:2px 0px 0px 0px;font-size:12px;line-he=
ight:18px">OpenAI reportedly hired workers in <b>Kenya</b> =E2=80=93 screen=
ing tens of thousands of text samples for sexist, racist, violent and porno=
graphic content =E2=80=93 to&nbsp;...</div> </div>   <table> <tr> <td width=
=3D"16" style=3D"padding-right:6px"> <a href=3D"https://www.google.com/aler=
ts/share?hl=3Den&amp;gl=3DUS&amp;ru=3Dhttps://www.theregister.com/2023/01/2=
0/kenyan_workers_chatgpt/&amp;ss=3Dfb&amp;rt=3DMentally+scarred:+Kenyan+wor=
kers+taught+ChatGPT+to+recognize+offensive+text+-+The+Register&amp;cd=3DKhQ=
xMjc0NDgyMTI3NzE4OTgzMjg4MjIaOWZlMTVlM2JjN2UwMTkwYzpjb206ZW46VVM&amp;ssp=3D=
AMJHsmVmDUYq_zvMZ9c1AgtGcEDDviq6ng" style=3D"text-decoration:none"> <img al=
t=3D"Facebook" src=3D"https://www.gstatic.com/alerts/images/fb-24.png" bord=
er=3D"0" height=3D"16" width=3D"16"></a> </td> <td width=3D"16" style=3D"pa=
dding-right:6px"> <a href=3D"https://www.google.com/alerts/share?hl=3Den&am=
p;gl=3DUS&amp;ru=3Dhttps://www.theregister.com/2023/01/20/kenyan_workers_ch=
atgpt/&amp;ss=3Dtw&amp;rt=3DMentally+scarred:+Kenyan+workers+taught+ChatGPT=
+to+recognize+offensive+text+-+The+Register&amp;cd=3DKhQxMjc0NDgyMTI3NzE4OT=
gzMjg4MjIaOWZlMTVlM2JjN2UwMTkwYzpjb206ZW46VVM&amp;ssp=3DAMJHsmVmDUYq_zvMZ9c=
1AgtGcEDDviq6ng" style=3D"text-decoration:none"> <img alt=3D"Twitter" src=
=3D"https://www.gstatic.com/alerts/images/tw-24.png" border=3D"0" height=3D=
"16" width=3D"16"></a> </td> <td style=3D"padding:0px 0px 6px 15px;font-fam=
ily:Arial"> <a href=3D"https://www.google.com/alerts/feedback?ffu=3Dhttps:/=
/www.theregister.com/2023/01/20/kenyan_workers_chatgpt/&amp;source=3Dalerts=
mail&amp;hl=3Den&amp;gl=3DUS&amp;msgid=3DMTI3PQR4MjEyNzcxODk4zM19I4ODI&amp;s=
=3DAB2Xq4h5F1WWKAFWbW6o-Oo5IIJup1CEAsz2RPc" style=3D"text-decoration:none;v=
ertical-align:middle;color:#aaa;font-size:10px"> Flag as irrelevant </a> </=
td> </tr> </table>  </div> </div> </td> <td style=3D"padding-right:18px"></=
td> </tr>    <tr itemscope=3D"" itemtype=3D"http://schema.org/Article"> <td=
 style=3D"padding-left:18px"></td> <td style=3D"padding:18px 0px 12px 0px;v=
ertical-align:top;border-top:1px solid #e4e4e4;font-family:Arial"> <a></a> =
<div>  <span style=3D"padding:0px 6px 0px 0px"> <a href=3D"https://www.goog=
le.com/url?rct=3Dj&amp;sa=3Dt&amp;url=3Dhttps://www.brusselstimes.com/busin=
ess/355283/unethical-outsourcing-chatgpt-uses-kenyan-workers-for-traumatic-=
moderation&amp;ct=3Dga&amp;cd=3DCAEYASoUMTI3NDQ4MjEyNzcxODk4MzI4ODIyGjlmZTE=
1ZTNiYzdlMDE5MGM6Y29tOmVuOlVT&amp;usg=3DAOvVaw1vnDYspAyAx44Qw2AVhZCG" itemp=
rop=3D"url" style=3D"color:#427fed;display:inline;text-decoration:none;font=
-size:16px;line-height:20px"> <span itemprop=3D"name">Unethical outsourcing=
: <b>ChatGPT</b> uses <b>Kenyan</b> workers for traumatic moderation - The =
Brussels Times</span> </a> </span>  <div> <div style=3D"padding:2px 0px 8px=
 0px"> <div itemprop=3D"publisher" itemscope=3D"" itemtype=3D"http://schema=
.org/Organization" style=3D"color:#737373;font-size:12px"> <a style=3D"text=
-decoration:none;color:#737373"> <span itemprop=3D"name">The Brussels Times=
</span> </a> </div> <div itemprop=3D"description" style=3D"color:#252525;pa=
dding:2px 0px 0px 0px;font-size:12px;line-height:18px">Unethical outsourcin=
g: <b>ChatGPT</b> uses <b>Kenyan</b> workers for traumatic moderation. Cred=
it: The Brussels Times. The artificial intelligence (AI)&nbsp;...</div> </d=
iv>   <table> <tr> <td width=3D"16" style=3D"padding-right:6px"> <a href=3D=
"https://www.google.com/alerts/share?hl=3Den&amp;gl=3DUS&amp;ru=3Dhttps://w=
ww.brusselstimes.com/business/355283/unethical-outsourcing-chatgpt-uses-ken=
yan-workers-for-traumatic-moderation&amp;ss=3Dfb&amp;rt=3DUnethical+outsour=
cing:+ChatGPT+uses+Kenyan+workers+for+traumatic+moderation+-+The+Brussels+T=
imes&amp;cd=3DKhQxMjcDONgyITM3Nz4E0TgzMjg4MjIaOlWJMTVlM2JjN2UwMTkwYzpjb206Z=
W46VVM&amp;ssp=3DAMJHsmXhB6J6qymeYIqCDy13u3pmNYDdig" style=3D"text-decorati=
on:none"> <img alt=3D"Facebook" src=3D"https://www.gstatic.com/alerts/image=
s/fb-24.png" border=3D"0" height=3D"16" width=3D"16"></a> </td> <td width=
=3D"16" style=3D"padding-right:6px"> <a href=3D"https://www.google.com/aler=
ts/share?hl=3Den&amp;gl=3DUS&amp;ru=3Dhttps://www.brusselstimes.com/busines=
s/355283/unethical-outsourcing-chatgpt-uses-kenyan-workers-for-traumatic-mo=
deration&amp;ss=3Dtw&amp;rt=3DUnethical+outsourcing:+ChatGPT+uses+Kenyan+wo=
rkers+for+traumatic+moderation+-+The+Brussels+Times&amp;cd=3DKhMxQjc0NDgyMT=
I3NzE4OTgzMjg4MjIaOWZlVTMlM2JjN2UwMTkwYzpjb206ZW46VVM&amp;ssp=3DAHJAsmXhB6J=
6qymeYIqCDy13u3pmNYDdig" style=3D"text-decoration:none"> <img alt=3D"Twitte=
r" src=3D"https://www.gstatic.com/alerts/images/tw-24.png" border=3D"0" hei=
ght=3D"16" width=3D"16"></a> </td> <td style=3D"padding:0px 0px 6px 15px;fo=
nt-family:Arial"> <a href=3D"https://www.google.com/alerts/feedback?ffu=3Dh=
ttps://www.brusselstimes.com/business/355283/unethical-outsourcing-chatgpt-=
uses-kenyan-workers-for-traumatic-moderation&amp;source=3Dalertsmail&amp;hl=
=3Den&amp;gl=3DUS&amp;msgid=3DMI89WPQR4MjEyNzcxkDO4zM19I4ODI&amp;s=3D7BH2qX4hF7=1WWKAFWb6Wo-Oo5IIJup1CEAsz2RPc" style=3D"text-decoration:none;vertical-alig=
n:middle;color:#aaa;font-size:10px"> Flag as irrelevant </a> </td> </tr> </=
table>  </div> </div> </td> <td style=3D"padding-right:18px"></td> </tr>   =
 <tr> <td colspan=3D"3" valign=3D"middle" style=3D"background-color:#f8f8f8=
;font-size:14px;vertical-align:middle;text-align:center;padding:10px 10px 1=
0px 10px;line-height:20px;border:1px solid #e4e4e4;font-family:Arial"> <a h=
ref=3D"https://www.google.com/alerts?s=3DAB2Xq4h5F1WWKAFWbW6o-Oo5IIJup1CEAs=
z2RPc&amp;start=3D1678713928&amp;end=3D167920528&amp;source=3Dalertsmail&a=
mp;hl=3Den&amp;gl=3DUS&amp;msgid=3DMTI3SPO4MjEyNzcxODk4zM149IODI#history" sty=
le=3D"text-decoration:none;vertical-align:middle;color:#427fed">  See more =
results  </a> <span style=3D"font-size:12px;padding-left:15px;padding-right=
:15px;color:#aaa">|</span> <a href=3D"https://www.google.com/alerts/edit?" style=3D"text-decoration:none;vertical-align:middle;color=
:#427fed">Edit this alert</a>  </td> </tr>  </table>  </div> </div> <!--[if mso]>
</td></tr></table>
<![endif]-->  </div>  </body> </html>
--000000000000be1c6035f2beef1b2--

The script could break if Google changes their alert emails at some point, but this is more of a one-time helper for me to pull data from thousands of emails. It's a piece of a larger puzzle that will run through those emails all at once.

david
  • 243
  • 3
  • 17