0

I have a generator function (generate_email_activity()) which is iterated through within a second function (to_amplitude()). The generator function yields a dictionary.

In my loop which iterates through the generator I have a problem: I apply transformations to values within the dictionary and those things stick on the next iteration of the loop.

Here's a minimal version of my code:

from pprint import pprint
import datetime

def epoch(dt):
    time = datetime.datetime.utcfromtimestamp(0)
    return (dt - time).total_seconds() * 1000

def generate_email_activity(n):
    for i in range(n):
        shipper = {'join_date': datetime.datetime.now()}
        for j in range(5):
            event = {'user_properties': shipper}
            yield event

def to_amplitude(generator,n):
    datagen = generator(n)
    for data in datagen:
        print data['user_properties']['join_date']
        data['user_properties']['join_date'] = epoch(data['user_properties']  ['join_date'])
        pprint(data)

to_amplitude(generate_email_activity,5)

Here is the output:

2016-05-20 10:31:18.023000
{'user_properties': {'join_date': 1463740278023.0}}
1.46374027802e+12
---------------------------------------------------------------------------          
TypeError                                 Traceback (most recent call last)
<ipython-input-41-0386bfffad24> in <module>()
     14         pprint(data)
     15 
---> 16 to_amplitude(generate_email_activity,5)

<ipython-input-41-0386bfffad24> in to_amplitude(generator, n)
     11     for data in datagen:
     12         print data['user_properties']['join_date']
---> 13         data['user_properties']['join_date'] =     epoch(data['user_properties']['join_date'])
     14         pprint(data)
     15 

<ipython-input-24-2a88ede629ea> in epoch(dt)
    313 def epoch(dt):
    314     time = datetime.datetime.utcfromtimestamp(0)
--> 315     return (dt - time).total_seconds() * 1000
    316 
    317 def to_amplitude(generator,n):

TypeError: unsupported operand type(s) for -: 'float' and 'datetime.datetime'

So the second time around the loop in to_amplitude() the value of data['user_properties']['join_date'] stuck. Of course the epoch() function is not happy then because it only accepts a datetime.

Now if I collapse the dictionary by one level as in the following code:

from pprint import pprint
import datetime

def epoch(dt):
    time = datetime.datetime.utcfromtimestamp(0)
    return (dt - time).total_seconds() * 1000

def generate_email_activity(n):
    for i in range(n):
        shipper = datetime.datetime.now()
        for j in range(5):
            event = {'user_properties': shipper}
            yield event

def to_amplitude(generator,n):
    datagen = generator(n)
    for data in datagen:
        print data['user_properties']
        data['user_properties'] = epoch(data['user_properties'])
        pprint(data)

to_amplitude(generate_email_activity,5)

The problem disappears. Below I include a few lines of the output:

2016-05-20 10:45:29.303000
{'user_properties': 1463741129303.0}
2016-05-20 10:45:29.303000
{'user_properties': 1463741129303.0}
2016-05-20 10:45:29.303000
{'user_properties': 1463741129303.0}

This is a stripped down version of the code so I don't want to apply the fix I just showed you. I'd just like to understand the problem please.

Alexander Soare
  • 2,825
  • 3
  • 25
  • 53
  • `{'user_properties': shipper}` will not create a new dictionary containing a copy of `shipper`, but a dictionary with a reference to `shipper` as value, and so all yielded dictionaries point to the same `shipper`. – Ilja Everilä May 20 '16 at 08:55
  • Excellent! Thanks. If anyone is interested I just changed `event = {'user_properties': shipper}` to `event = {'user_properties': shipper.copy()}` and all was fine. – Alexander Soare May 20 '16 at 08:59

0 Answers0