0

I have scanned image and extracted text from image using Google Vision Api. Now i'm facing problem in extracting name and address from scanned text.Through some regex i'm able to detect street code and zipcode from text but not whole address and name.

    def find_between_r(s, first, last):
            try:
                start = s.index(first) + len(first)
                end = s.index(last, start)
                return s[start:end]
            except ValueError:
                return ""
text=""" 17000 AJHshkjadj dakd ext ESTES RICHMOND VA 23230 On Coll UNIFORM STRAIGHT BILL OF LADING - Original - Not Negotiable - Short Form (EXLA, 3901 WEST BROAD STREET Date 10/12/2017 OBOL No W093556Shippers No P.0. No 16846 very shipments, the letters 'COD appear befo For Payment Bill To Bill being paid by Shipper Consignee ENCANTADA RESORT Ken Smith 407-997-3731 Sp tio 3070 SECRET LAKE DR Ruwes Turiff EXLA 105. KISSIMMEE FL 34747 Shipper WINDWARD DESIGN NO ACCESSORIAL SERVICESADDED WITHOUT PRIOR APPROVAL FROM WINDWARD 941-359-0890 1130 COMMERCE BLVD N SARASOTA FL 34243 ird Is M trial E cy P le # O00 000 0000 NOTE: Liability Limitation for loss or damage on this shipment may be applicable. See 49 U.S.C. 14706 (c)(1XA) and (B No. Pkgs HMI Kind or Peak ange. Description of Articles, Special Marks and Exceptions NMFC Declared Valius TW. (Sub Com) | Chass/Rate Ohk22 CT PATIO FURNITURE 1400 200.0 22 1400 Quote# 4867496 APPOINTMENT CHARGE LIFTGATE DELIVERY CHARGE Rade doled val jeclared Excess WARNING Additional dam Mc LiabiRef IOS the d rges Advanced S Received $. to apply NOTE: Wh JOTE: Commodi requnng speci Subject to Sect 7 of Condit this shipmentrequired specific handling marked and to be delivered to the consignee without recourse wning the agreed or declared ith ordinary ignor, the ignor shall sign the property. The agreed or declared value See Sec NMFC 360. follohereby sp Sally stated by the The fibe booxes used thms shipment Innke del shipp 6pecif forth in the box Lake itbour payme freight and all other lawfuland all other repair Consolidated US NMFC | charge the shi PT is byBill of Lacing shall in the prepayment of the charges on the property described hereof. igned, destination RECEIVED. ly determined have been agroced upo icable otherwise the classif and rules (EstaExpros Linehave been castablished by shippes. stThe property described abovein apparent good ords, and codi of packages unknown) rked, ared destined otion said Juleotherwis Ily agreed. property 1y porton desertion and as to cocb party a serty, that every performed thereunder shall be sult all the 1d oCodhi Bill of Lacing the National Motor Freight Cassifit 100-X and also agreed liable or any consequental damages arising from the delaysery dates (Subject of any app! Gold M Service Ageroement) SHIPPER CERTIFICATION CARRIER CERTIFICATION Gignatueits agreement to all o orching to the applicable regulations Express Lines-EXLA ized Signature Date (Dae Iolel who ret TPMLD Colon coDfee & Shipper O Collect On Delivery C.OD, Amount Certified Check Freight Charges are PREPAID unless marked collect CHECK BOX to be paid by { Consignee Consignee Check Accepted IF COLLECT Mark Ig the PLTS STC PC and Loose Place Guaranteed Sticker Here Tsos ITIL TS Elections or AO eControl IDOT- Pro# 000000028388396 PAGE 624318 O489b24AL8"""

    data=[]
    Ship_Cons = re.findall(r'\b(?=SHIP|Ship|SHIPPER|Shipper|ONSIGNEE|Onsignee|CONSIGNEE|Consignee|FROM|TO).*',value)
    val=" ".join(map(str,Ship_Cons))
    zip_code = re.findall(
                    r"((?=AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|"
                    r"NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY)[A-Z]{2}[, ])"
                    r"(\d{5}(?:-\d{4})?|\d{4}(?:-\d{4})?|\d{3}(?:\s\d{2})|\d{3}(?:\s\d{1}\s\d{1})"
                    r"|\d{2,5}(?:\s\d{2,5})(?:-\d{4})?)",val)
                # print(zip_code)
    for item in zip_code:
       data.append("".join(item))
    address = re.findall(r"\s\d{4}\s|\w*[a-z]\s\w*[a-z]\s\d{4}\s|\s\d{5}\s",val)
    print("Address",address)
    print(print(find_between_r(val,address[0],data[0])))

I'm getting

SECRET LAKE DR Ruwes Turiff EXLA 105. KISSIMMEE

as output from above code. How to avoid unncessary value like Turiff EXLA 105. and address and not able to get name also.Can anyone help me to solve this. Thank you

  • 2
    First off, good luck. Address parsing is not 100% possible as there are way too many formats and variables to take into account; there are, however, best guess methods that can be used. Also, what's the exact expected output? – ctwheels Oct 18 '17 at 15:50
  • Expected output ->SECRET LAKE DR KISSIMMEE @ctwheels – Ankita Agharkar Oct 18 '17 at 15:52
  • 1
    It's not really possible with the way that addresses are (*un*)structured, unfortunately. The only way I can think of doing this is creating an OCR template, which, unfortunately, I don't believe Vision provides. There may be other methods possible with Vision, but I don't know enough about it to help you any further. Maybe someone else has an idea, but regex is definitely not the best tool (or a tool that should ever be used) for this job. [How to parse freeform address](https://stackoverflow.com/questions/11160192/how-to-parse-freeform-street-postal-address-out-of-text-and-into-components) – ctwheels Oct 18 '17 at 16:00
  • Thank you @ctwheels – Ankita Agharkar Oct 20 '17 at 05:08

0 Answers0