0

I have to search due date in below sms that also contain billed date.

"Bill dated 27-May-18 for your airtel fixedline/broadband ID ##### has been sent at abc@abc.com from ebill@abc.com. Due amount: Rs 2,358.82, due date: 15-Jun-18"

there could be many different sms in which they may not have billed date or due date mentioned explicitly in the sms. So previously, we were just looking for a date format by regex,

"((\d{4}|\d{2})-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-(\d{4}|\d{2}))|((\d{4}|\d{2})-\d{2}-(\d{4}|\d{2}))|((\d{4}|\d{2})\/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\/(\d{4}|\d{2}))|((\d{4}|\d{2})\/(\d{3}|\d{2})\/(\d{4}|\d{2}))|([0-3][0-9]((st)|(nd)|(rd)|(th))((\s)?)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))" 

It works well when there is only due date and no bill date. But in above sms it always returns bill date instead of due date.

I want to find a solution that it will always return me date like what above regex do or if there is any due date mentioned then it should return me date next to due date.

possible solution I see is

  1. check for due date - (due date: )
  2. if found return me date next to due date - (?<=due date: ).*
  3. else 2 not found then return me any date format that exist using the above mentioned regex
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • Here you go `(?:.*duedate:\s*(15-Jun-18)|(otherdate))` The influence of the one `.*` extends into both worlds neo. –  Jun 11 '18 at 18:17
  • Use [this regex](https://regex101.com/r/gYQ0LN/1). You might also want to use `RegexOptions.Singleline` if the text spans across multiple lines. Note that you need `.*` on both ends of the first date capturing pattern to consume the whole string. Also, FYI, you do not need to escape `/` in a .NET regex ever. – Wiktor Stribiżew Jun 11 '18 at 22:59
  • @sln Thanks for response. I am beginner in regex. Can you please me, how can I construct a regex which will, check if "due date:" text is present in SMS, return date next to it other return date next to "Due amount:" I tried this, (?(due date: )((?<=due date: ).*) | ((?<=Due amount).*)) but this do not seems to work. Where I am going wrong with it? – Ranjan Patra Jun 12 '18 at 08:52
  • @wiktor, Thanks for response. I am beginner in regex. Can you please me, how can I construct a regex which will, check if "due date:" text is present in SMS, return date next to it other return date next to "Due amount:" I tried this, (?(due date: )((?<=due date: ).*) | ((?<=Due amount).*)) but this do not seems to work. Where I am going wrong with it? – Ranjan Patra Jun 12 '18 at 08:57
  • @sln I am unable to make your regex to work for me. Can you please detail it more? It seems closer to what i need – Ranjan Patra Jun 12 '18 at 08:58
  • @RanjanPatra I already added [this https://regex101.com/r/gYQ0LN/1 link](https://regex101.com/r/gYQ0LN/1) in my comment. This is your solution. – Wiktor Stribiżew Jun 12 '18 at 08:59
  • @wiktor I tried your solution in my .Net program, but it seems to select wrong date (Billed date) for above SMS instead of due date. This is because the Regex class in the program returns an array of matches and it selects the first Match to display. We cannot change this part of program. Is it possible to get only one match of due date: when there is two date in the SMS. Thanks in advance – Ranjan Patra Jun 12 '18 at 09:30
  • See https://ideone.com/Tnp5jY, let me know if it is OK. – Wiktor Stribiżew Jun 12 '18 at 10:37

1 Answers1

0

All you have to do is make due date optional at the beginning [\S\s]due date.
This will start at the end of the string then backtrack until it finds it
then match the date right after it.

If it's not found, the first date it finds in the string will be matched.

To tell if it matched due date check if group 1 matched.
If it's null, it wasn't matched.

The regex:

@"(?:[\S\s]*(due\s+date\s*:)\s*)?((\d{4}|\d{2})-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-(\d{4}|\d{2}))|((\d{4}|\d{2})-\d{2}-(\d{4}|\d{2}))|((\d{4}|\d{2})/(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/(\d{4}|\d{2}))|((\d{4}|\d{2})/(\d{3}|\d{2})/(\d{4}|\d{2}))|([0-3][0-9]((st)|(nd)|(rd)|(th))((\s)?)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))"

Readable version:

    (?:
         [\S\s]*  
         ( due \s+ date \s* : )        # (1)
         \s* 
    )?

    (                             # (2 start)
         ( \d{4} | \d{2} )             # (3)
         -
         (                             # (4 start)
              Jan
           |  Feb
           |  Mar
           |  Apr
           |  May
           |  Jun
           |  Jul
           |  Aug
           |  Sep
           |  Oct
           |  Nov
           |  Dec
         )                             # (4 end)
         -
         ( \d{4} | \d{2} )             # (5)
    )                             # (2 end)
 |  (                             # (6 start)
         ( \d{4} | \d{2} )             # (7)
         - \d{2} -
         ( \d{4} | \d{2} )             # (8)
    )                             # (6 end)
 |  (                             # (9 start)
         ( \d{4} | \d{2} )             # (10)
         /
         (                             # (11 start)
              Jan
           |  Feb
           |  Mar
           |  Apr
           |  May
           |  Jun
           |  Jul
           |  Aug
           |  Sep
           |  Oct
           |  Nov
           |  Dec
         )                             # (11 end)
         /
         ( \d{4} | \d{2} )             # (12)
    )                             # (9 end)
 |  (                             # (13 start)
         ( \d{4} | \d{2} )             # (14)
         /
         ( \d{3} | \d{2} )             # (15)
         /
         ( \d{4} | \d{2} )             # (16)
    )                             # (13 end)
 |  (                             # (17 start)
         [0-3] [0-9] 
         (                             # (18 start)
              ( st )                        # (19)
           |  ( nd )                        # (20)
           |  ( rd )                        # (21)
           |  ( th )                        # (22)
         )                             # (18 end)
         (                             # (23 start)
              ( \s )?                       # (24)
         )                             # (23 end)
         (                             # (25 start)
              Jan
           |  Feb
           |  Mar
           |  Apr
           |  May
           |  Jun
           |  Jul
           |  Aug
           |  Sep
           |  Oct
           |  Nov
           |  Dec
         )                             # (25 end)
    )                             # (17 end)

Output

 **  Grp 0 -  ( pos 0 : len 159 ) 
Bill dated 27-May-18 for your airtel fixedline/broadband ID ##### has been sent at abc@abc.com from ebill@abc.com. Due amount: Rs 2,358.82, due date: 15-Jun-18  
 **  Grp 1 -  ( pos 140 : len 9 ) 
due date:  
 **  Grp 2 -  ( pos 150 : len 9 ) 
15-Jun-18  
 **  Grp 3 -  ( pos 150 : len 2 ) 
15  
 **  Grp 4 -  ( pos 153 : len 3 ) 
Jun  
 **  Grp 5 -  ( pos 157 : len 2 ) 
18  
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  NULL 
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  NULL 
 **  Grp 16 -  NULL 
 **  Grp 17 -  NULL 
 **  Grp 18 -  NULL 
 **  Grp 19 -  NULL 
 **  Grp 20 -  NULL 
 **  Grp 21 -  NULL 
 **  Grp 22 -  NULL 
 **  Grp 23 -  NULL 
 **  Grp 24 -  NULL 
 **  Grp 25 -  NULL