0

I am using this Github repo and folder path I found: https://github.com/entechlog/kafka-examples/tree/master/kafka-connect-standalone to run Kafka connect locally in standalone mode. I have made some changes to the Docker compose file but mainly changes that pertain to authentication.

The problem I am now having is that when I run the Docker image, I get this error multiple times, for each partition (there are 10 of them, 0 through 9):

[2021-12-07 19:03:04,485] INFO [bq-sink-connector|task-0] [Consumer clientId=connector-    consumer-bq-sink-connector-0, groupId=connect-bq-sink-connector] Found no committed offset for partition <topic name here>-<partition number here> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1362)

I don't think there are any issues with authenticating or connecting to the endpoint(s), I think the consumer (connect sink) is not sending the offset back.

Am I missing an environment variable? You will see this docker compose file has CONNECT_OFFSET_STORAGE_FILE_FILENAME: /tmp/connect.offsets, and I tried adding CONNECTOR_OFFSET_STORAGE_FILE_FILENAME: /tmp/connect.offsets (CONNECT_ vs. CONNECTOR_) and then I get an error Failed authentication with <Kafka endpoint here>, so now I'm just going in circles.

Sultan of Swing
  • 430
  • 1
  • 6
  • 20
  • 1
    Standalone mode may use only local offset file. Read this https://docs.confluent.io/home/connect/userguide.html#standalone-mode . And `local` means not on a topic at the kafka brokers. AND `INFO` is not an error. – J. Song Dec 08 '21 at 00:42
  • @J.Song I understand it uses a local offset file. So to me it makes sense that there was no committed offset, because it's a local temporary file that is "empty". So how can I start consuming topics if there is no committed offset? How would one ever run in standalone mode for testing purposes if there is never a last known offset? I'm not sure I understand your last sentence though. – Sultan of Swing Dec 08 '21 at 03:11
  • As answered before, I see no reason to use this specific container over the existing Confluent ones that run distributed mode. Why do you not want to use them? – OneCricketeer Dec 08 '21 at 03:46
  • Regarding your mentioned error, `CONNECTOR_` is for the **connector** standalone properties file (BQ Sink). `CONNECT_` is for the **worker** (where `offset.storage.file.name` property actually exists) – OneCricketeer Dec 08 '21 at 03:47
  • @OneCricketeer I'm still learning a lot about Kafka and how it works, I do understand standalone vs. distributed but I understand standalone is very useful for local testing, which is what I'm doing. I am running a POC and have to learn as I go here. Can you point me to something out of the box that is easy to run in Docker that I can configure with AWS MSK IAM authentication, will allow me to run on my local machine so I can test, and I can use with the wepay BigQuery connector that has already been created? I would like to compare the differences so I can fully understand. – Sultan of Swing Dec 08 '21 at 03:51
  • @OneCricketeer I'm trying to understand the prefixes, do I have to create the local offset file for the BQ Sink (consumer) or the worker? – Sultan of Swing Dec 08 '21 at 03:52
  • I'm just trying to understand why my Sink cannot create an offset locally in that file in `/tmp`... – Sultan of Swing Dec 08 '21 at 03:57
  • Distributed mode works fine locally, and will match what you'd run in production, anyway. Since you're not setting breakpoints and debugging anything, just use distributed mode... Any Docker container can be configured with IAM JAR files (it's certainly not part of the repo you're using), so I don't see how that is related to the error you're getting – OneCricketeer Dec 08 '21 at 04:22
  • And also, you really don't need Docker for local testing. You can download Kafka itself, which includes Connect scripts, and not mess with container filesystems at all. It'd be easier to modify the classpath for Connect process that way, too. – OneCricketeer Dec 08 '21 at 04:24
  • For the project I am working on, Docker is required – Sultan of Swing Dec 08 '21 at 04:30
  • Yes I have configured it with the IAM jars. Again the only issue is how to get this offsetting to work. I just need to use docker and this repo was suggested to me and seems relatively straightforward. Not sure why it's not working – Sultan of Swing Dec 08 '21 at 04:31
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/239933/discussion-between-sultan-of-swing-and-onecricketeer). – Sultan of Swing Dec 08 '21 at 04:54

1 Answers1

-1

I think you are focused on the wrong output.

  1. That is an INFO message
  2. The offsets file (or topic in distributed mode) is for source connectors.

Sink connectors use consumer groups. If there is no found offset found for groupId=connect-bq-sink-connector, then the consumer group didn't commit it.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • So how can I "ask" my consumer group for the sink connector, namely `connect-bq-sink-connector`, to commit an offset so the consumer can begin to take in messages from the topics? I'm guessing that's what's needed to make this work? I thought running in distributed mode would help, but I get an actual error when doing that. – Sultan of Swing Dec 09 '21 at 04:20
  • Offset commits are typically done automatically in sink connectors, and you should grep your logs for `ERROR` or maybe `WARN` to find instances where it doesnt – OneCricketeer Dec 09 '21 at 04:27
  • You can use `kafka-consumer-groups --describe --group connect-bq-sink-connector` to inspect the committed offsets – OneCricketeer Dec 09 '21 at 04:28
  • There aren't any other errors in the logs I believe. I can check for WARN. But what kind of things should I look for that would indicate no offset commits? And that command, should I be doing that in the kafka-connect container? – Sultan of Swing Dec 09 '21 at 04:41
  • That command can be run anywhere you have installed Kafka and can connect to a broker. IIRC, `Failed to commit` to `TimeoutException` come to mind. Obviously if there is no data in the topic when you start the connector, then there will be no offsets to commit, anyway. So, you should first start the connector, then produce some data. – OneCricketeer Dec 09 '21 at 06:20
  • unfortunately 1. I did not see any errors or warnings similar to `failed to commit` or `timeout` or `timeoutexception`., and 2. I also tried to run `kafka-consumer-groups --describe --group connect-bq-sink-connector` but required a bootstrap server, so instead I bashed into the `kafka-connect` container and ran `kafka-consumer-groups --describe --group connect-bq-sink-connector --bootstrap-server broker:9092` and received: `Connection to node 1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.` – Sultan of Swing Dec 10 '21 at 07:47
  • The Connect container is not a broker. You need to give the bootstrap server address that you've defined elsewhere for Connect to run – OneCricketeer Dec 12 '21 at 04:25
  • the bootstrap server of the cluster from which I consume topics you mean? as in, where my topics are coming from (the data source) for me to ingest? – Sultan of Swing Dec 12 '21 at 04:28
  • That's correct. `localhost` is the Connect container, not the Kafka container that exposes port 9092 – OneCricketeer Dec 12 '21 at 04:38
  • So if the other Kafka cluster is hosted elsewhere, I need to give that address, like abc-123.something.aws.com:1234. And if there are multiple, I just separate them by commas? The reason why that doesn't make sense to me is because, is my Kafka connect cluster not a consumer basically? The Kafka cluster I use to get topics from is not managed by me and merely makes topics available. I run a separate cluster and add their multiple bootstrap servers to get data from – Sultan of Swing Dec 12 '21 at 04:44
  • I don't understand the question. It is a consumer. You'd need to give the same address if you want to use `kafka-console-consumer`. Simply, "localhost" is not Kafka, which explains the connection error in the CLI – OneCricketeer Dec 12 '21 at 05:12
  • I'll look up the command in the docs, I'm not understanding what to put in for bootstrap server – Sultan of Swing Dec 12 '21 at 05:14
  • It'd be `broker:9002`, which you can find by looking at the compose file https://github.com/entechlog/kafka-examples/blob/master/kafka-connect-standalone/docker-compose.yml#L63 – OneCricketeer Dec 12 '21 at 05:15
  • The line you mentioned is where the comma separated externally hosted bootstrap servers go. I'll play around but I think I already tried putting `broker:9092` in the command you mentioned. See my comments earlier. Unless I'm not understanding. If I'm not then you'll have to dumb it down for me, I know you are clearly well versed in Kafka but I am not. I have only started this stuff in the last week and I'm completely lost. I'm not sure what I'm trying to do with these commands as I'm just wondering why no offsets are being committed in my local temp file I defined in the docker compose file. – Sultan of Swing Dec 12 '21 at 05:22
  • The only information you've given in the question is the compose file, so `KAFKA_ADVERTISED_LISTENERS` uses `broker:9092`. I cannot guess what external bootstrap server address you've used if you changed the file. Regarding files, I've already answered that. Files aren't used here. The only files that would be created are the actual topic data on the broker side, nothing in the Connect container because you're using a sink connector. https://stackoverflow.com/questions/42072854/kafka-connect-sink-task-ignores-file-offset-storage-property#52239083 – OneCricketeer Dec 12 '21 at 14:09