I have a generator function (generate_email_activity()
) which is iterated through within a second function (to_amplitude()
). The generator function yields a dictionary.
In my loop which iterates through the generator I have a problem: I apply transformations to values within the dictionary and those things stick on the next iteration of the loop.
Here's a minimal version of my code:
from pprint import pprint
import datetime
def epoch(dt):
time = datetime.datetime.utcfromtimestamp(0)
return (dt - time).total_seconds() * 1000
def generate_email_activity(n):
for i in range(n):
shipper = {'join_date': datetime.datetime.now()}
for j in range(5):
event = {'user_properties': shipper}
yield event
def to_amplitude(generator,n):
datagen = generator(n)
for data in datagen:
print data['user_properties']['join_date']
data['user_properties']['join_date'] = epoch(data['user_properties'] ['join_date'])
pprint(data)
to_amplitude(generate_email_activity,5)
Here is the output:
2016-05-20 10:31:18.023000
{'user_properties': {'join_date': 1463740278023.0}}
1.46374027802e+12
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-41-0386bfffad24> in <module>()
14 pprint(data)
15
---> 16 to_amplitude(generate_email_activity,5)
<ipython-input-41-0386bfffad24> in to_amplitude(generator, n)
11 for data in datagen:
12 print data['user_properties']['join_date']
---> 13 data['user_properties']['join_date'] = epoch(data['user_properties']['join_date'])
14 pprint(data)
15
<ipython-input-24-2a88ede629ea> in epoch(dt)
313 def epoch(dt):
314 time = datetime.datetime.utcfromtimestamp(0)
--> 315 return (dt - time).total_seconds() * 1000
316
317 def to_amplitude(generator,n):
TypeError: unsupported operand type(s) for -: 'float' and 'datetime.datetime'
So the second time around the loop in to_amplitude()
the value of data['user_properties']['join_date']
stuck. Of course the epoch()
function is not happy then because it only accepts a datetime.
Now if I collapse the dictionary by one level as in the following code:
from pprint import pprint
import datetime
def epoch(dt):
time = datetime.datetime.utcfromtimestamp(0)
return (dt - time).total_seconds() * 1000
def generate_email_activity(n):
for i in range(n):
shipper = datetime.datetime.now()
for j in range(5):
event = {'user_properties': shipper}
yield event
def to_amplitude(generator,n):
datagen = generator(n)
for data in datagen:
print data['user_properties']
data['user_properties'] = epoch(data['user_properties'])
pprint(data)
to_amplitude(generate_email_activity,5)
The problem disappears. Below I include a few lines of the output:
2016-05-20 10:45:29.303000
{'user_properties': 1463741129303.0}
2016-05-20 10:45:29.303000
{'user_properties': 1463741129303.0}
2016-05-20 10:45:29.303000
{'user_properties': 1463741129303.0}
This is a stripped down version of the code so I don't want to apply the fix I just showed you. I'd just like to understand the problem please.