I'm near a total outsider of programming, just interested in it. I work in a Shipbrokering company and need to match between positions (which ship will be open at where, when) and orders (what kind of ships will be needed at where, when for what kind of employment). And we send and receive such info (positions and orders) by emails to and from our principals and co-brokers. There are thousands of such emails each day. We do the matching by reading the emails manually.
I want to build an app to do the matching for us.
One important part of this app will do the information extraction from email text.
==> My question is how do I use Python to extract unstructured info into structured data.
Sample email of an order [annotation in the brackets, but is not included in the email]:
Email Subject: 20k dwt requirement, 20-30/mar, Santos-Conti
Content:
Acct ABC [Account Name]
Abt 20,000 MT Deadweight [Size of Ship Needed]
Delivery to make Santos [Delivery Point/Range, Owners will deliver the ship to Charterers here]
Laycan 20-30/Mar [Laycan (the time spread in which delivery can be accepted]
1 time charter with grains [What kind of Empolyment/Trade, Cargo]
Duration about 35 days [Duration]
Redelivery 1 safe port Continent [Redelivery Point/Range, Charterers will redeliver the ship back to Owners here.]
Broker name/email/phone...
End Email
Same email above can be written in many different ways - some writes in one line, some use l/c instead of laycan... And there are emails for positions with ship's name, open port, date range, ship's deadweight and other specs.
How can I extract the info and put it into structured data, with Python? Let's say I have put all email contents into text files. Thanks.