1

I'd like to ship metrics from Telegraf to GCP's BigQuery.

How should I do this? what are the pitfalls?

FuzzyAmi
  • 7,543
  • 6
  • 45
  • 79

2 Answers2

4

Having researched this, I think the best way to approach this problem is using Google's DataFlow. Here's the general idea:

  1. Use Telegraf's cloud_pubsub output plugin to ship json-formatted metrics to Google's pubsub service
  2. Use Google's dataflow service to ingest from pubsub and ship to Google's BigQuery
  3. Profit!

This may seem easy, but requires quite a bit of knowledge about these tools. Here are some of things you need to keep in mind:

  1. Start by figuring out what your metrics look like - and create a matching schema for them. BigQuery is schema-based and you MUST define it in advanced or else loading the data into will fail. I found that defining an additional 'file' output for my Telegraf client was beneficial in seeing exactly what json metrics I was shipping into BiqQuery. Note that you can also remove/edit metrics using Telegraf's measurement_filter
  2. Use Telegraf's cloud_pubsub plugin. As I'm writing this answer (Feb 19), its very new and only included in the Master branch. I believe it's intended for Telegraf version 1.10.
  3. Create a pubsub topic, a BigQuery table (with a schema) and google storage bucket - you'll need all three items to create a DataFlow project. You'll also need a credentials file which you generate in GCP and then point your Telegraf client at.
  4. Once done, create a DataFlow project in GCP. These cannot be edited once created and annoyingly, they cannot be deleted (but they can be stopped). So you'll probably end up creating multiple projects until you get it right. I found that almost everything I did required a new project (for example, deleting and then recreating a BigQuery table with the same name required a new project).
  5. If you're seeing errors in dataflow, you should be able to ascertain whether this is a schema issue from the convoluted stack-trace in the logs sub-window in DataFlow. Your metrics MUST EXACTLY match the scheme or else it will not load into the table.
  6. I wasn't able to load metrics in influx format, but I'm told it should be possible. I used the 'json' format, and that worked.
  7. In case you have nested json fields {"person":{"first-name":"john", etc}}, define your nesting field as RECORD type in the BigQuery schema and your nested field within it. its not very obvious unless you already know BigQuery.
  8. I discovered that editing the BigQuery schema in the GCP console was very limited - especially for nested json fields. I had to re-create it a few times from scratch because the UI wont allow editing nested fields after the initial table configuration.
FuzzyAmi
  • 7,543
  • 6
  • 45
  • 79
  • 1
    All is true! But my lazy approach is to just log the full JSON string into BigQuery (+ a timestamp) and parse later inside BigQuery. Thus I ensure that no data is lost, and I'm free to iterate finding the best schema later. – Felipe Hoffa Feb 06 '19 at 01:34
  • @FelipeHoffa - interesting. How do you ship the metrics as a single string using telegraf? – FuzzyAmi Feb 06 '19 at 06:53
  • I don't have experience with Telegraf, but probably I would change something here (https://github.com/influxdata/telegraf/blob/master/plugins/outputs/cloud_pubsub/pubsub.go, toMessages) or change the Dataflow code. Not sure why you have to delete projects tho, that was a weird part of the recipe btw (now that I re-read it) – Felipe Hoffa Feb 06 '19 at 07:25
  • @FelipeHoffa Seems that DataFlow projects are immutable, so if you want to modify something after you've created it, you cant. You simply HAVE to create a new one. After awhile, you start having quite a few, which clutters your UI. Unfortunately, you cant delete them - its something Google have yet to implement (https://stackoverflow.com/questions/46038981/how-to-delete-a-gcloud-dataflow-job) – FuzzyAmi Feb 06 '19 at 07:30
0

As of version 1.18.0 Telegraf ships with an official BigQuery output plugin.

cameronhimself
  • 414
  • 3
  • 10