0

I have a raw text

Exmp 1:

order pickup details>>> >>> pick up before the store closes on Wed, Apr 11>>> >>> 
scan in-store for order pickup>>> >>> >>> 9019560>>>    Warrenville Target Store>>> 28201 Diehl Rd, Warrenville, IL 60555

Exmp 2:

    Come to collect your order in the next 2 days (after that it'll be cancelled). Your payment will be processed as soon as you collect your order.>> >>  >> 
Pickup Store:>> >> Lush Naperville <https://click.e.lush.com/?qs=cbb6669d6dac2528c696ad86bb5b6fd3ebae7703b0b05e2a40dbc6705d0f3325fe891806d5a629b19dbc9b8e9d36e46e7d944d995ea896decd587d210c8bb838>>> 
119 S. Main Street , Naperville, IL 60540>> >> Choose between curbside or in-store pickup.>>

How the address part can be extracted from the above text in Node.js. How to solve this?

Actually, what happening here is. I am getting different emails from different stores after order confirmation. I need to get the store address from the email. Here each store using a different format for their email order confirmation.

I am getting this raw text after converting the email template into text format.

Below one is the related question to my problem. But it is in python.

How can I extract address from raw text using NLTK in python?

Is there any way to detect the address from the text? I am new to this.

Lakshmi
  • 278
  • 5
  • 19
  • Hi ! Would you provide us at least 3 or 4 text examples to understand the rule ? Did you try `regex` extraction ? – Philippe Oct 07 '21 at 07:01
  • I added another example, I didn't use the regex extraction @Philippe – Lakshmi Oct 07 '21 at 07:21
  • @Philippe Can you tell me the example regex for address detection in the string – Lakshmi Oct 07 '21 at 11:24
  • Hi ! I hardly tried but didn't succeed. I advice you to tag `regex` on your question, some experts here have a better knowledge of this question :) Sorry ! – Philippe Oct 07 '21 at 11:27
  • I would look at some of the npm packages like https://www.npmjs.com/package/parse-address – scarpacci Oct 10 '21 at 00:00

1 Answers1

0

The RegExp for the above type of address format in Node.js is

var text = "pick up before the store closes on Wed, Apr 11>>> >>> 
scan in-store for order pickup>>> >>> >>> 9019560>>>    Warrenville Target Store>>> 28201 Diehl Rd, Warrenville, IL 60555"

var regex = "[0-9]{1,5} .+, .+, [A-Z]{2} [0-9]{5}";

var Address = text.match(regex);
console.log("Address",Address);

// Address = 28201 Diehl Rd, Warrenville, IL 60555

Explanation:

[0-9]{1,3}: 1 to 3 digits, the address number

(space): a space between the number and the street name

.+: street name, any character for any number of occurrences

,: a comma and a space before the city

.+: city, any character for any number of occurrences

,: a comma and a space before the state

[A-Z]{2}: exactly 2 uppercase chars from A to Z

[0-9]{5}: 5 digits

text.match(regex) will return an array with all the occurrences found.

However this regex is only used for particular type of address format.

Lakshmi
  • 278
  • 5
  • 19