I have a script that pulls from the city if Chicago and grabs a json file and then published to Pub Sub. Once the data gets into pub sub I have a dataflow template that pulls the data into Google Big Query. The final data move to BQ is failing and when I print the output in the script. I am getting a u' in front of all the fields which I believe is messing up the field match. Has anyone else had this issue and know what is wrong with my code and how to possibly remove the 'u'. I have tried multiple fix but none of them have worked. A sample out put is listed below:
('_last_updt', '2010-07-21 14:50:53.0'), ('_length', '0.69'), ('_lif_lat', '41.985032613'),
My code is listed below:
from __future__ import unicode_literals
from sodapy import Socrata
import json
from io import StringIO
from google.oauth2 import service_account
from oauth2client.client import GoogleCredentials
from google.cloud import pubsub_v1
import time
import datetime
import urllib
import urllib.request
import argparse
import base64
credentials = GoogleCredentials.get_application_default()
# change project to your Project ID
project="xxxx"
# change topic to your PubSub topic name
topic="xxxx"
res = urllib.request.urlopen('https://data.cityofchicago.org/resource/8v9j-bter.json')
res_body = res.read()
traffic=json.loads(res_body)
publisher = pubsub_v1.PublisherClient()
topicName = 'projects/' + project + '/topics/' + topic
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project,topic)
for key in traffic:
publisher.publish(topicName,str.encode(str(key)))
print(key.items())