I am reading beam documentation and some of stackoverflow questions/answers in order to understand how would i write a pubsub message to bigquery. As of now, I have working example of getting protobuf
messages and able to decode
them. the code looks like this
(p
| 'ReadData' >> apache_beam.io.ReadFromPubSub(topic=known_args.input_topic, with_attributes=True)
| 'ParsePubsubMessage' >> apache_beam.Map(parse_pubsubmessage)
)
Eventually, what i want to do is write decoded pub-sub message to bigquery. all attribtues (and decoded byte data) will have one-to-one column mapping.
So what is confusing me is what should my parse_pubsubmessage
return. As of now, It is returning a custom class which has all fields i.e.,
class DecodedPubsubMessage:
def __init__(self, attr, event):
self.attribute_one = attr['attribute_one']
self.attribute_two = attr['attribute_two']
self.order_id = event.order.order_id
self.sku = event.item.item_id
self.triggered_at = event.timestamp
self.status = event.order.status
Is this correct approach to do this dataflow? What i was thinking that i will use this returned value to write to bigquery but due to advance python feature, i am unable to understand how to. Here is a reference example that i was looking at. From this example, I am not sure how would i do the lambda
map on my returned object to write to bigquery.