-1

So I am making a small script where I print out everytime there has been an update on my UPS tracking basically.

Right now I have done a script that looks like:

 tracking_full_site = 'https://wwwapps.ups.com/WebTracking/track?track=yes&trackNums=' + url #URL is the last tracking numbers that I can't provide due to incase someone changes anything with my tracking.

    headers = {
        'User-Agent': ('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
                       ' (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36')
    }
    resp = s.get(tracking_full_site, headers=headers, timeout=12)
    resp.raise_for_status()

    bs4 = soup(resp.text, 'lxml')
    old_list = []

    for item in bs4.findAll('tr', {'valign': 'top'}):
        where_is_it = " ".join(item.text.split())
        old_list.append(where_is_it)

    print(old_list)

    sys.exit()

However the outprints that I get is:

United States 28.08.2018 6:16 Package departed international carrier facility
Edgewood, NY, United States 27.08.2018 20:00 Package transferred to post office
United States 27.08.2018 18:42 Package processed by international carrier
EDGEWOOD, NY, United States 24.08.2018 15:51 Package processed by UPS Mail Innovations origin facility
24.08.2018 12:55 Package received for processing by UPS Mail Innovations
United States 22.08.2018 8:19 Shipment information received by UPS Mail Innovations

which looks pretty well with the function " ".join(item.text.split())

My question is, How can I split it so etc I can print out Just the country etc or the date, time or description?

EDIT:

This is the full HTML is anyone want to see:

<table summary="" border="0" cellpadding="0" cellspacing="0" class="dataTable">
   <tbody>
      <tr>
         <th scope="col">Location</th>
         <th scope="col">Date</th>
         <th scope="col">Local Time</th>
         <th scope="col" class="full">Activity&nbsp;(<a class="btnlnkR helpIconR" href="javascript:helpModLvl('https://www.ups.com/content/se/en/tracking/tracking/description.html')">What's this?</a>)</th>
      </tr>
      <tr valign="top">
         <td class="nowrap">
            United States
         </td>
         <td class="nowrap">
            28.08.2018
         </td>
         <td class="nowrap">
            6:16
         </td>
         <td>Package departed international carrier facility</td>
      </tr>
      <tr valign="top" class="odd">
         <td class="nowrap">
            Edgewood,&nbsp;
            NY,&nbsp;
            United States
         </td>
         <td class="nowrap">
            27.08.2018
         </td>
         <td class="nowrap">
            20:00
         </td>
         <td>Package transferred to post office</td>
      </tr>
      <tr valign="top">
         <td class="nowrap">
            United States
         </td>
         <td class="nowrap">
            27.08.2018
         </td>
         <td class="nowrap">
            18:42
         </td>
         <td>Package processed by international carrier</td>
      </tr>
      <tr valign="top" class="odd">
         <td class="nowrap">
            EDGEWOOD,&nbsp;
            NY,&nbsp;
            United States
         </td>
         <td class="nowrap">
            24.08.2018
         </td>
         <td class="nowrap">
            15:51
         </td>
         <td>Package processed by UPS Mail Innovations origin facility</td>
      </tr>
      <tr valign="top">
         <td class="nowrap">
         </td>
         <td class="nowrap">
            24.08.2018
         </td>
         <td class="nowrap">
            12:55
         </td>
         <td>Package received for processing by UPS Mail Innovations</td>
      </tr>
      <tr valign="top" class="odd">
         <td class="nowrap">
            United States
         </td>
         <td class="nowrap">
            22.08.2018
         </td>
         <td class="nowrap">
            8:19
         </td>
         <td>Shipment information received by UPS Mail Innovations</td>
      </tr>
   </tbody>
</table>

My wish for output would be etc.:

Country: United State
Date: 28.08.2018
Time: 6:16
Description: Package departed international carrier facility

As you can see in the outprints, not everything has its each country. Be aware of that!

To one of the answers editors:

['Sweden', '29.08.2018', '11:08', 'Package arrived at international carrier']
['United States', '28.08.2018', '6:16', 'Package departed international carrier facility']
['Edgewood,\t\t\t\t\t\t\t\n\n\t\t\t\t            \n\t\t\t\t            \t\n\t\t\t\t            \tNY,\t\t\t\t            \n\n\t\t\t\t            \n\t\t\t\t            \t\n\t\t\t\t            \tUnited States', '27.08.2018', '20:00', 'Package transferred to post office']
['United States', '27.08.2018', '18:42', 'Package processed by international carrier']
['EDGEWOOD,\t\t\t\t\t\t\t\n\n\t\t\t\t            \n\t\t\t\t            \t\n\t\t\t\t            \tNY,\t\t\t\t            \n\n\t\t\t\t            \n\t\t\t\t            \t\n\t\t\t\t            \tUnited States', '24.08.2018', '15:51', 'Package processed by UPS Mail Innovations origin facility']
['', '24.08.2018', '12:55', 'Package received for processing by UPS Mail Innovations']
['United States', '22.08.2018', '8:19', 'Shipment information received by UPS Mail Innovations']
CDNthe2nd
  • 369
  • 1
  • 5
  • 19

2 Answers2

1
array = []
for item in soup.findAll('tr', {'valign': 'top'}):
     array.append([f.text.strip().replace("\xa0\n\t", "") for f in item.findAll("td")])
output = []
for e in array:
   output.append({"Country": e[0].replace("   ", ""), "Date": e[1], "Time": e[2], "Description": e[3]})

 if you want to print only the country, just do this
 for element in output:
    print (element["Country"])
GraphicalDot
  • 2,644
  • 2
  • 28
  • 43
  • Some of the country has `\t\t\t\t\t\t\t\n\n\t\t\t\t \n\t\t\t\t \t\n\t\t\t\t \tNY,\t\t\t\t \n\n\t\t\t\t \n\t\t\t\t \t\n\t\t\t\t \t` – CDNthe2nd Aug 30 '18 at 17:01
  • Please run the code, strip() will take care of \n\t\t\t\t . – GraphicalDot Aug 30 '18 at 17:07
  • I copy pasted the code and it was good but at the end it came `{'Country': 'Edgewood,\t\t\t\t\t\t\t\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\tNY,\t\t\t\t\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\tUnited States'` – CDNthe2nd Aug 30 '18 at 17:08
  • I edited the post again so you can see how it looks. – CDNthe2nd Aug 30 '18 at 17:11
  • I had to add this `array.append([f.text.strip().replace("\xa0\n", "").replace("\t", "").replace("\n", "").replace(" ", "") for f in item.findAll("td")])` Seems to work now! But how do I in that case now print out etc just countries? (outside the loop of course) – CDNthe2nd Aug 30 '18 at 17:13
  • iterate on the list and just print the key Country, Updated the code above. – GraphicalDot Aug 30 '18 at 17:17
  • Awesome! I think this is what I wanted! Can thank you enough! I will mark this as the answer! – CDNthe2nd Aug 30 '18 at 17:20
  • Glad you liked it, Thank you. – GraphicalDot Aug 30 '18 at 17:21
0

Once you have the GET response, put it in a variable (respString), then parse it. The idea is to read through the html and identify where the information is.

If you are targeting this part of the HTML:

<tr valign="top" class="odd">
   <td class="nowrap">
      United States
   </td>
   <td class="nowrap">
      22.08.2018
   </td>
   <td class="nowrap">
      8:19
   </td>
   <td>Shipment information received by UPS Mail Innovations</td>
</tr>

This should get you the "United States" part from parsing the HTML:

var startIndex = respString.indexOf('<td class="nowrap">');
var tempRespString = respString.substring(startIndex);
var tempStartIndex = tempRespString.indexOf('>');
var tempEndIndex = tempRespString.indexOf('</');
var country = tempRespString.substring(tempStartIndex + 1, tempEndIndex);

If there are multiple similar strings and you can't properly index it - say you need to target the 3rd ...

'<td class="nowrap">'

... then you basically find the first one, substring it at the end (cut off the first showing of that pattern), then do the same thing and cut off the second showing of that pattern), until you find the correct information.

Just get creative and find ways of parsing the data you want for a HTML response.

Ying Li
  • 2,500
  • 2
  • 13
  • 37
  • Isn't that Javascript though? Im using Python :'( – CDNthe2nd Aug 30 '18 at 16:54
  • Yeah, sorry about that. That's literally from one of my working code that does almost exactly what you are trying to do. Just use the same logic in Python... Here's the [syntax](https://stackoverflow.com/q/2294493/4092401) for indexOf. – Ying Li Aug 30 '18 at 17:00
  • 1
    Oh alright! I will try and see if I can make it to work! Appreciate it alot :) – CDNthe2nd Aug 30 '18 at 17:02