0
  • I have an application where I do scraping on websites
  • then I compare with the database if the data is already saved
def filter_out_already_saved_events(venue_id, events):
    db_events = [x for x in event_dao if x['venueId'] == venue_id]
    return [x for x in events if not compare(x, db_events)]

def compare(x, db_events):
    for y in db_events:
        if y['time'] == x['time'] and y['title'] == x['title']:
            return True
    return False
  • I have saved all the scraped data in the database
  • So I expect to get an empty list
  • When I run the code on my local machine I get what I expect
  • When I deploy the code a docker container I get all the data even though I expect nothing.
  • I ran a test log and I can see that the timestamp is changed on the docker
    • Scraped timestamp 1636574400
    • database timestamp 1636570800
    • difference is excactly 3600 seconds (1 hour)
  • On the local machine the timestamps are the same
    • Scraped timestamp 1636570800
    • database timestamp 1636570800

Timestamp is created from a string with the following code:

def get_time(date, time):
    global year, last
    fmt = '%Y-%m-%dT%H:%M'
    day, month = [x.strip() for x in date.split('.')]
    month = mdr[month]
    if int(month) < last: year = year + 1 
    last = int(month)
    string = f'{year}-{month}-{day}T{time}'
    obj = datetime.strptime(string, fmt)
    return int(datetime.timestamp(obj))

As for docker I am using the image python:3 On the local machine I am using Python 3.9.7

Comparing all the scraped data on Docker vs. Local machine I get the following differences

The function get_time(date, time) returns:

DOCKER     |    LOCAL     |   DIFF
1636574400   1636570800       3600
1636660800   1636657200       3600
1636747200   1636743600       3600
1636833600   1636830000       3600
1637179200   1637175600       3600
1637265600   1637262000       3600
1637352000   1637348400       3600
1637438400   1637434800       3600
1637784000   1637780400       3600
1637870400   1637866800       3600
1637956800   1637953200       3600
1638043200   1638039600       3600
1639252800   1639249200       3600
1643227200   1643223600       3600
1643313600   1643310000       3600
1643400000   1643396400       3600
1645214400   1645210800       3600
1645819200   1645815600       3600
1647720000   1647716400       3600
1648324800   1648321200       3600
1649448000   1649440800       7200
1651262400   1651255200       7200
1667678400   1667674800       3600
mama
  • 2,046
  • 1
  • 7
  • 24
  • 1
    Add logs/print statements at each step and debug. You can easily view the output of a running container by launching in attached mode or by using `docker logs -f ` – Irfanuddin Nov 06 '21 at 13:24
  • You sure your docker is interacting with the same database? – Ujjwal Agrawal Nov 06 '21 at 13:44
  • yes 100 % sure! – mama Nov 06 '21 at 13:50
  • 1
    I just followed @iudeen s advice and found out that it was the Integer that is different on the scraper on the docker instance. I have no idea why. (Just updated the question) – mama Nov 06 '21 at 14:01
  • Yes I just marked my question closed with this answer as the answer. – mama Nov 06 '21 at 21:50
  • Docker is mostly UTC. You can change timezone of docker container or make your datetime objects tz aware. – Irfanuddin Nov 07 '21 at 23:06
  • Thank you I made it tz aware by adding `timestamp = dt.replace(tzinfo=timezone.utc).timestamp()` to the timestamp – mama Nov 07 '21 at 23:07

0 Answers0