0

I'm trying to extract numbers such as the ones listed below from my Gmail messages using Google Apps Script.

2,495.00
1,594
3,777.23
642.00

This is the code:

function myFunction() {

  var sheet = SpreadsheetApp.getActiveSheet();
  var threads = GmailApp.search('subject:(Transaction) after:2016/7/31 before:2016/8/10');

  for (var i=0; i<threads.length; i++)
  {
    var messages = threads[i].getMessages();
    var tmp;

    for (var j=0; j<messages.length; j++)
    {
      var content = messages[j].getBody();
      var subject = messages[j].getSubject();
      var date = messages[j].getDate();
      Logger.log(content);

      if (content)
      {
        tmp = content.match(/\d+(,\d+)*(\.\d+(e\d+)?)?/);
        var number = (tmp && tmp[j]) ? tmp[j] : 'No number';
        sheet.appendRow([number, subject, date]);
      }
      else
      {
        sheet.appendRow([content, subject, date]);
      }

    }
  }
}

I've been getting mixed results. For some messages this works as intended but for some it completely skips the numbers from the messages. I'm a newbie to JS/GAS and I thought the problem was in the regex but I'm not sure. Any help in this would be appreciated.

hpb
  • 111
  • 1
  • 7

1 Answers1

0

you are facing two trouble here:
the regex you are using don't look optimised (but neither what you are looking is clear a number like 1,594 should not be found if you are also looking at number that look like that 642.00). Nevertheless you could use a a regex like Shekhar Khairnar proposed in comment or something similar (the gat the end is important as there is more than one number in your mail).
The second trouble is in the line var number = (tmp && tmp[j]) ? tmp[j] : 'No number';. Why is there a jvar in this line? jis reference to the for loop --> number of messages, nothing to do with the occurences in your message.
What I can propose you is something like that:

function myFunction() {

  var sheet = SpreadsheetApp.getActiveSheet();
  var threads = GmailApp.search('test');
  var re = /(?:\d{1,3}[,])*\d{1,3}\.{0,1}\d{1,3}/g;

  for (var i=0; i<threads.length && i<5; i++) // added a condition because I didn't wanted to have too many results
  {
    var messages = threads[i].getMessages();
    var tmp;

    for (var j=0; j<messages.length; j++)
    {
      var content = messages[j].getPlainBody(); //.getBody();
      var subject = messages[j].getSubject();
      var date = messages[j].getDate();
      //Logger.log(content);

      if (content)
      {
        tmp = content.match(re); // /\d+(,\d+)*(\.\d+(e\d+)?)?/);
        var number = tmp || ['No number']; // result of tmp is either null or an array --> if it's null then it will take the value 'no number'
        Logger.log(number);
        sheet.appendRow([number.join(" | "), subject, date]);
      }
      else
      {
        sheet.appendRow([content, subject, date]);
      }

    }
  }
}
Harold
  • 3,297
  • 1
  • 18
  • 26
  • This article describes your challenge in a lot of detail: http://stackoverflow.com/questions/5917082/regular-expression-to-match-numbers-with-or-without-commas-and-decimals-in-text It recommends this regex: (\d*\.?\d+|\d{1,3}(,\d{3})*(\.\d+)?) I think you would also want to make the regex match multiline and global and then iterate through all of the matches, to get all of the numbers in the body. Read the article carefully, because matching numbers is a non-trivial problem and even this regex does match some cases which you might not want to include. – alfiethecoder Aug 30 '16 at 13:46
  • Thanks Harold and alfiethecoder! That really helped! – hpb Aug 30 '16 at 18:29
  • How would I modify the regex to find only numbers with decimals. This essentially would exclude numbers such as `1,524` – hpb Aug 30 '16 at 18:42
  • try something like that: `/(?:\d{1,3},)*\d{1,3}\.\d{2}/g` personally I use this site https://regex101.com/ to build my regex – Harold Aug 31 '16 at 07:25