3

Please see the temporary solution at the end.

Summary (added 12/24/22 for clarification):

USPS's tracking API is not returning responses in the same format as their documentation. The actual format makes it difficult to extract the event date since there is no EventDate XML element. Worst case, I can use regex, but was wondering if there was a way to receive API responses as showing in USPS's documentation.

Details

In USPS's Track and Confirm API documentation page 19, the sample response shows <TrackSummary> with child elements (<EventTime>, <EventDate>, etc.):

Screenshot of USPS's sample response

Here's USPS's sample response in text:

<TrackResponse>
 <TrackInfo ID=" XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ">
 <GuaranteedDeliveryDate>June 24, 2022</GuaranteedDeliveryDate>
 <TrackSummary>
 <EventTime>9:00 am</EventTime>
 <EventDate>June 22, 2022</EventDate>
 <Event>Delivered, To Agent</Event>
 <EventCity>AMARILLO</EventCity>
 <EventState>TX</EventState>
 <EventZIPCode>79109</EventZIPCode>
 <EventCountry/>
 <FirmName/>
 <Name>RXXXXXX XXXXXXX</Name>
 <AuthorizedAgent>false</AuthorizedAgent>
 <DeliveryAttributeCode>23</DeliveryAttributeCode>
 <GMT>14:00:00</GMT>
 <GMTOffset>-05:00</GMTOffset>
 </TrackSummary>

However, when performing the call, the actual XML response lacks these children elements within TrackSummary:

<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
    <TrackInfo ID="9405511206213782679396">
        <TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
        <TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
        <TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
        <TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
        <TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
    </TrackInfo>
</TrackResponse>

This can be reproduced with Lob's USPS Postman workspace

The problem I'm trying to solve is obtaining the date from the TrackSummary data, which now requires regex since USPS's API is not returning an EventDate child element.

Is there an option when making the request to return these helpful XML child elements? I couldn't find one in the documentation and the sample responses I've seen all contain these child elements.

I've tried forming the request in Python and with Lob's USPS workspace and both XML responses lack the TrackSummary child elements.

Long-term solution (in progress 12/26/22)

@Parfait pointed out that I should use the Package Tracking “Fields” API instead of the Package Track API.

Here's how I'm currently forming the XML request with Package Track API:

from lxml import etree

def generate_url_tracking(tracking_numbers: list[str]) -> str:
    """generate the USPS tracking request url
    :param: tracking_numbers - list of strings of tracking numbers
    :return url: str tracking url for calling the USPS API
    """
    xml = generate_xml_tracking(tracking_numbers)
    url = f"{base_url}{url_vars['track']}{xml}"
    return url

def generate_xml_tracking(tracking_numbers: list[str]) -> str:
    """
    Generate USPS track and confirm API xml
    :param tracking_numbers: list of strings of tracking numbers
    :return: xml string
    """
    xml = etree.Element("TrackRequest", {"USERID": config("USPS_USER")})
    # loop through tracking numbers
    for tracking in tracking_numbers:
        etree.SubElement(xml, "TrackID", {"ID": tracking})
    xml_string = etree.tostring(xml, encoding="utf8", method="xml").decode()
    return xml_string

I'll update this to the Package Tracking “Fields” API request when I get time.

Temporary Solution (12/25/22)

Until USPS's actual responses match their API docs, this solution extracts the last updated date from <TrackSummary> for several different statuses (pre-shipment, delivered, RTS, etc.)

The TRACK_SUMMARIES dict has the different statuses it's tested against. Some statuses without dates (no_info, out_for_delivery_no_date) return None.

import re
from dateutil.parser import ParserError, parse

TRACK_SUMMARIES = {
    "delivered": """Your
     item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
    "out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
    "out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
    "arrived_at_post_office": """Arrived at Post Office,
     Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
    "acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
    "pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
    "rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
     because of an incorrect address.""",
    "no_info": "The Postal Service could not locate the tracking information for your request",
    "label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
    "forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
        in REDDING, CA. This was because of forwarding instructions or because the
        address or ZIP Code on the label was incorrect.
        """,
}

def get_last_updated(track_summary: str) -> Optional[datetime]:
    """Takes the USPS TrackSummary string and return the last updated datetime"""
    # remove the zip code since it interferes with the date parser
    track_summary = re.sub(r"\d{5}", "", track_summary)
    months_regex = "January|February|March|April|May|June|July|August|September|October|November|December"
    first_result = re.search(rf"(?={months_regex}).*", track_summary)
    # return early if there's no Month
    if not first_result:
        return
    first_result = first_result.group()
    # some summaries have am/pm and some don't
    result_for_parser = re.search(r".*(?<=am|pm)", first_result)
    if result_for_parser:
        result_for_parser = result_for_parser.group()
    else:
        result_for_parser = first_result
    try:
        # fuzzy parsing is required for dates in certain summaries
        result = parse(result_for_parser, fuzzy=True)
    except ParserError:
        return
    return result

Sources:

Using the dateutil parser Regex for finding months

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • So your problem not a parsing problem. It is no child data problem. Correct? – Bench Vue Dec 24 '22 at 15:41
  • Correct. I think parsing or regex are brittle solutions. If USPS's actual responses matched their API docs, it would solve the problem. But parsing/regex may be the best workaround for now. – Nathan Smeltzer Dec 25 '22 at 14:04
  • 1
    Reading the USPS API docs, there are multiple versions of the API where different requests return different responses. Your current response is result of the section 2.0 _Package Track API_ (see sample request on p.2 limited to `TrackID` field). But to get the p.19 response you need to run the section 3.0 _Package Tracking "Fields" API_ (see sample request on p.9 requiring more fields). – Parfait Dec 26 '22 at 00:37
  • Thanks @Parfait. This is what I was looking for! It was an error on my part not reading the full documentation. I'll update my question to include how I'm currently generating the xml for the request in python. Then, once I figure it out, I'll post the updated xml request. – Nathan Smeltzer Dec 26 '22 at 12:30

2 Answers2

1

xml.etree.ElementTree is good job to find a child by XPath

it provides limited support for XPath expressions for locating elements in a tree. But it is good enough to find TrackSummary data

To find 'TrackSummary' children of the top-level

root.find(".//TrackSummary").text ->
Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.

This python demo

import xml.etree.ElementTree as ET
import datetime

document = """\
<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
    <TrackInfo ID="9405511206213782679396">
        <TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
        <TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
        <TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
        <TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
        <TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
    </TrackInfo>
</TrackResponse>
"""

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""

root = ET.fromstring(document)

date_time_obj = datetime.datetime.strptime(find_between(root.find(".//TrackSummary").text,' on ', '.'), '%B %d' + ", " + '%Y at %I:%M %p')
print('Date:', date_time_obj.date())
print('Time:', date_time_obj.time())
print('Date-time:', date_time_obj)

Result

$ python track-summary.py
Date: 2022-12-23
Time: 12:40:00
Date-time: 2022-12-23 12:40:00

Updated for Reg expression parsing

Base on your updated question for Temporary Solution (12/25/22) I added parsing part with import re library.

Code

import re
import numpy as np
from datetime import date, time, datetime

def get_date(date_string):
    months = np.array(['January','February','March','April','May','June','July','August','September','October','November','December'])
    pattern = re.compile(r'(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})')
    match = re.search(pattern, date_string)
    if not match:
        d = None
    else:
        month_data = match.groups()[0]
        month = np.where(months==month_data)[0][0] + 1
        day = int(match.groups()[1])
        year = int(match.groups()[2])
        try:
            d = date(year, month, day)
        except ValueError:
            d = None  # or handle error in a different way
    return d

def get_hour_min(hour, min, am_pm):
    hour = int(hour)
    min = int(min)
    add_hour = 0
    if (am_pm == 'pm'):
        if (hour != 12):
            add_hour = 12
    return [hour+add_hour,  min]

def get_time(date_string):
    pattern = re.compile(r'(\d{2}|\d{1})\:(\d{2})\s*(am|pm)')
    matches = re.findall(pattern, date_string)
    if (len(matches) == 2):
        hour, min = get_hour_min(matches[0][0], matches[0][1], matches[0][2])
        start_t = time(hour, min, 0)
        hour, min = get_hour_min(matches[1][0], matches[1][1], matches[1][2])
        end_t = time(hour, min, 0)
        return [start_t, end_t]

    match = re.search(pattern, date_string)
    if not match:
        t = None
    else:
        hour, min = get_hour_min(match.groups()[0], match.groups()[1], match.groups()[2])
        try:
            t = time(hour, min, 0)
        except ValueError:
            t = None  # or handle error in a different way
    return [t, None]

TRACK_SUMMARIES = {
    "delivered": """Your
     item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
    "out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
    "out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
    "arrived_at_post_office": """Arrived at Post Office,
     Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
    "acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
    "pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
    "rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
     because of an incorrect address.""",
    "no_info": "The Postal Service could not locate the tracking information for your request",
    "label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
    "forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
        in REDDING, CA. This was because of forwarding instructions or because the
        address or ZIP Code on the label was incorrect.
        """,
}

tracks = {}
# parsing and tuple list by key ( example : delivered, out_for_delivery and so on )
for key in TRACK_SUMMARIES:
    value = TRACK_SUMMARIES[key].replace("\n", "")
    found_date = get_date(value)
    start_time, end_time = get_time(value)
    tracks[key] = [ found_date, start_time, end_time, value ]
    # print(key, '->', value)
    # if (found_date != None):
    #     print('found date: ' + found_date.strftime("%m/%d/%Y"))
    # if (start_time != None):
    #     if(end_time == None):
    #         print('time: ' + start_time.strftime("%H:%M:%S"))
    #     else:
    #         print('start time: ' + start_time.strftime("%H:%M:%S") + ' end time: ' + end_time.strftime("%H:%M:%S"))
    # print('=========================================================================')

# decoding from tuple list by key ( tracks['delivered'], tracks['out_for_delivery'] and so on )
for key in tracks.keys():
    found_date, start_time, end_time, value = tracks[key]
    
    found_date = found_date.strftime("%m/%d/%Y") if found_date != None else None
    start_time = start_time.strftime("%H:%M:%S") if start_time != None else None
    end_time = end_time.strftime("%H:%M:%S") if end_time != None else None

    print(value)
    print(key)
    if (found_date != None):
        print('found date: ' + found_date)
    if (start_time != None):
        if(end_time == None):
            print('time: ' + start_time)
        else:
            print('start time: ' + start_time + ' end time: ' + end_time)
    print('------------------------------------------------------------------------')

Result

$ python reg-express.py
Your     item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.
delivered
found date: 12/24/2022
time: 10:23:00
------------------------------------------------------------------------
Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.
out_for_delivery
found date: 12/13/2021
time: 06:10:00
------------------------------------------------------------------------
Out for Delivery, Expected Delivery Between 9:45am and 1:45pm
out_for_delivery_no_date
start time: 09:45:00 end time: 13:45:00
------------------------------------------------------------------------
Arrived at Post Office,     Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER
arrived_at_post_office
found date: 12/11/2021
time: 21:23:00
------------------------------------------------------------------------
Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313
acceptance
found date: 12/10/2021
time: 12:54:00
------------------------------------------------------------------------
Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021
pre_shipment
found date: 12/27/2021
------------------------------------------------------------------------
Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402     because of an incorrect address.
rts
found date: 01/31/2022
time: 09:14:00
------------------------------------------------------------------------
The Postal Service could not locate the tracking information for your request
no_info
------------------------------------------------------------------------
A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON
label_prepared
found date: 12/16/2021
time: 10:47:00
------------------------------------------------------------------------
Your item was forwarded to a different address at 5:13 pm on January 4, 2022        in REDDING, CA. This was because of forwarding instructions or because the        address or ZIP Code on the label was incorrect.
forwarded
found date: 01/04/2022
time: 17:13:00
------------------------------------------------------------------------

Date/time patterns

I extract from your TRACK_SUMMARIES dictionary data. This is time and date pattern, some line no date and some has Between time.

10:23 am on December 24, 2022
December 13, 2021, 6:10 am
Between 9:45am and 1:45pm
December 10, 2021, 12:54 pm
December 27, 2021
January 31, 2022 at 9:14 am
at 10:47 am on December 16, 2021
at 5:13 pm on January 4, 2022

Date parsing

(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})

enter image description here

enter image description here Matched item with groups - it use in code.

enter image description here

Time parsing

(\d{2}|\d{1})\:(\d{2})\s*(am|pm)

enter image description here

enter image description here

Matched item with groups - it use in code.

enter image description here

References

Find string between two substrings

Converting Strings Using datetime

Regexper

regular expression 101

Bench Vue
  • 5,257
  • 2
  • 10
  • 14
  • Hi @bench-vue. I wish USPS's package tracking API returned exactly in the format shown in your example. Sadly, it's returning without children elements in the ``: ```Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination. ``` Is your USPS API response copy/pasted from somewhere or is this what you're actually receiving when calling the API? – Nathan Smeltzer Dec 24 '22 at 15:30
  • I have no experience USPS API, I just imagine from your pointing documentation. So you can test it or not? – Bench Vue Dec 24 '22 at 15:34
  • Can you posting your real response data? – Bench Vue Dec 24 '22 at 15:36
  • Sure, it's in the original question as it's too long to post here. – Nathan Smeltzer Dec 24 '22 at 15:42
  • @NathanSmeltzer, thanks I saw your real XML file. There are 8+1 date, which date want to pickup or all of it? – Bench Vue Dec 24 '22 at 15:50
  • Just the date within ``: December 23, 2022. At this point, it looks like regex is the solution. I'll try [this written month regex](https://stackoverflow.com/a/35413952/2469390) with a lookahead and then run it through Python's dateutil parser. My original goal was to have neater XML returned from USPS so regex wouldn't be necessary. – Nathan Smeltzer Dec 24 '22 at 16:07
  • @NathanSmeltzer, I updated my answer base on your real XML data. Let me know it works or not. – Bench Vue Dec 24 '22 at 16:48
  • This works for delivered shipments, but not for certain tracking summaries, such as out for delivery statuses. I've updated the question with a temporary solution that covers all statuses I've received, along with a dict of all statuses. I really appreciate your work on a solution and I should have listed all possible statuses from the start had I known parsing/regex would be the only solution for now. – Nathan Smeltzer Dec 25 '22 at 14:21
  • @NathanSmeltzer, Hey I updated my answer base on your `TRACK_SUMMARIES` dictionary data. It can parsing with `re` library. I try explain but it is less description instead screen capture. But if you understand my approaches you can apply your desire work. Let me know I needs add information also my added two web sites for regular expression visualization and expression matching result. It will be help your understanding. Let me know I will help you. – Bench Vue Dec 26 '22 at 03:20
  • Thanks so much for the detailed explanation. Although longer than the temporary solution, yours seems better at getting accurate times. My temporary solution didn't return times for all dates using dateutil's parsing. I'm marking your answer as accepted. – Nathan Smeltzer Dec 26 '22 at 12:03
  • No problem, I am also learn a lot from your question. I hope to address your final goal with this topic. Happy Christmas and New Year! – Bench Vue Dec 26 '22 at 12:39
  • Thanks, happy holidays to you as well! – Nathan Smeltzer Dec 27 '22 at 10:57
0

The solution to not receiving an <Event> XML element in the USPS API response was that I was using the Package Tracking API instead of the Package Tracking “Fields” API. Thanks to @Parfait for pointing this out.

Use Revision 0 instead of Revision 1 unless you need all of the additional fields. With Revision 0, you don't need to include the SourceId or ClientIp elements in your request.

from lxml import etree

def generate_url_tracking(tracking_numbers: list[str]) -> str:
    """generate the USPS tracking request url
    :param: tracking_numbers - list of strings of tracking numbers
    :return url: str tracking url for calling the USPS API
    """
    xml = generate_xml_tracking(tracking_numbers)
    url = f"{base_url}{url_vars['track']}{xml}"
    return url


def generate_xml_tracking(tracking_numbers: list[str]) -> str:
    """
    Generate USPS track and confirm API xml
    :param tracking_numbers: list of strings of tracking numbers
    :return: xml string
    """
    xml = etree.Element("TrackFieldRequest", {"USERID": config("USPS_USER")})
    # using 0 instead of 1 for the Revision allows us to skip the ClientIp and SourceId requirements
    etree.SubElement(xml, "Revision").text = "0"
    # etree.SubElement(xml, "ClientIp").text = "xxx.xxx.xxx.xxx"
    # etree.SubElement(xml, "SourceId").text = "ShipAware"
    # loop through tracking numbers
    for tracking in tracking_numbers:
        etree.SubElement(xml, "TrackID", {"ID": tracking})
    xml_string = etree.tostring(xml, encoding="utf8", method="xml").decode()
    logger.debug(f"xml_string: {xml_string}")
    return xml_string