0

I have an email in HTML format containing text and tables that I am trying to parse using javascript. The text parsing works just fine, I like just have to run a regex to get what I need from the content, e.g.:

var name     = mail.bodyText.match(/Name:\s*(.*)/);

Now the table part is quite tricky. Say the table contains 3 columns and I only want to retrieve data from the first column where associated data is listed. When I type the following:

var column1Data = mail.bodyText.match(/Column1([\s\S]*?)/);
    if (column1Data) {
        var column1DataSplit = sources[1].split("\n");}
}

Data is not retrieved.

Example of a html table:

enter image description here

Any idea on how to retrieve a html table bodyText?

Thanks.

keenthinker
  • 7,645
  • 2
  • 35
  • 45
  • 1
    Not enough information to answer. Can you post an example of the html in question? – Rocky Sims Oct 28 '18 at 09:29
  • Hi Rocky, I added an image showing an example of that table. – Jane Dublin Oct 28 '18 at 09:37
  • Is it an option to use jQuery? – keenthinker Oct 28 '18 at 09:39
  • Can you not do something with `document.querySelector('table td:eq(0)')` or something? Also.. hard to answer if we don't know what the HTML looks like. – putvande Oct 28 '18 at 09:42
  • I must admit I have no experience in using jQuery. My code is very simple, as the one above. Should I use jQuery for that kind of parsing? – Jane Dublin Oct 28 '18 at 09:46
  • 2
    Sorry, didn't make myself clear. I meant to ask that you post an example of the html as text. – Rocky Sims Oct 28 '18 at 09:51
  • True it would have been easier for me at first to have the text format, but I don't have it. – Jane Dublin Oct 28 '18 at 09:57
  • where is `sources` defined? What is the actual thing you're trying to find in the table? – lacostenycoder Oct 28 '18 at 10:03
  • Actually, it is very simple. As I simply could parse the horizontal html lines. I would like to parse the columns (in a vertical way). Say I'd like to have the list of Names retrieved from the table. And that would be the list {Jane DUBLIN, Spencer HOWLING} under "Name" field. That would have been easier for me if the data was in one line (html wouldn't have bothered, it would have been parsed properly with the code above). – Jane Dublin Oct 28 '18 at 10:12
  • 3
    Please read this: https://stackoverflow.com/a/1732454/1220550 – Peter B Oct 28 '18 at 10:39
  • 1
    Questions seeking help ("why isn't/how to make this code working?") must include the desired behavior, a specific problem or error and the **shortest code necessary to reproduce it** in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a [mcve]. – Asons Oct 28 '18 at 10:44

2 Answers2

0

Why not just find the name row in the table?

var td = document.querySelectorAll('td:nth-child(1)');
for (var i in td) {
  var nameData = td[i].innerHTML
  if ( i > 0 && nameData ) // skip header row
  console.log(nameData)
}
lacostenycoder
  • 10,623
  • 4
  • 31
  • 48
  • Jane isn't trying to test if `'Jane Dublin'` is present in the table. The goal is to extract the names. See Jane's comment on the question that starts 'Actually, it is very simple.'. – Rocky Sims Oct 28 '18 at 10:43
  • see final updated answer, should give you just what you need. – lacostenycoder Oct 28 '18 at 10:57
  • yeah in the beginning I wasn't sure if you were looking for something more specific. – lacostenycoder Oct 28 '18 at 10:59
  • Hi guys, thanks a lot for your swift replies. But let me put it otherwise: I have no idea what the HTML tables codes are, so my parser acts as if it was the dumbest one ever; it grabs random emails bodyMails and extract whatever it wants from it. Now, luckily there are simple parts that can be extracted easily because the data usually follows its associated tag, say: **Name : Jane**. But when it comes to tables with for one tag you have multiple rows, it becomes tricky and seemingly undoable with only regex queries. – Jane Dublin Oct 28 '18 at 11:19
  • 1
    If you "have no idea what the HTML tables codes are" how can you possibly parse them? Either you have the html text to parse or you don't have anything to parse. If you want help with this you're going to need to provide at least few examples (html text) of what you want to parse (in your question). Perhaps we can generalize from multiple examples to get a solution that will work for you in all (or at least most) cases. – Rocky Sims Oct 29 '18 at 13:19
-2

You haven't provided enough detail in your question for me to be able to give you an answer to your specific problem but it sounds like maybe you're just asking about how to grab the html from the first column in each row of a table. Here is how you could do that.

<table id="myTable">
  <tr>
    <td>r1c1</td>
    <td>r1c2</td>
  </tr>
  <tr>
    <td>r2c1</td>
    <td>r2c2</td>
  </tr>
</table>

<script type="text/javascript">
  var tds = document.querySelectorAll('td:nth-child(1)');
  for (td of tds) {
    console.log(td.innerHTML);
  }
</script>

Console output:

"r1c1"
"r2c1"

https://codepen.io/rockysims/pen/NOJMap?editors=1011

Rocky Sims
  • 3,523
  • 1
  • 14
  • 19