I've never used kafka before. I have two test Go programs accessing a local kafka instance: a reader and a writer. I'm trying to tweak my producer, consumer, and kafka server settings to get a particular behavior.
My writer:
package main
import (
"fmt"
"math/rand"
"strconv"
"time"
"github.com/confluentinc/confluent-kafka-go/kafka"
)
func main() {
rand.Seed(time.Now().UnixNano())
topics := []string{
"policymanager-100",
"policymanager-200",
"policymanager-300",
}
progress := make(map[string]int)
for _, t := range topics {
progress[t] = 0
}
producer, err := kafka.NewProducer(&kafka.ConfigMap{
"bootstrap.servers": "localhost",
"group.id": "0",
})
if err != nil {
panic(err)
}
defer producer.Close()
fmt.Println("producing messages...")
for i := 0; i < 30; i++ {
index := rand.Intn(len(topics))
topic := topics[index]
num := progress[topic]
num++
fmt.Printf("%s => %d\n", topic, num)
msg := &kafka.Message{
Value: []byte(strconv.Itoa(num)),
TopicPartition: kafka.TopicPartition{
Topic: &topic,
},
}
err = producer.Produce(msg, nil)
if err != nil {
panic(err)
}
progress[topic] = num
time.Sleep(time.Millisecond * 100)
}
fmt.Println("DONE")
}
There are three topics that exist on my local kafka: policymanager-100, policymanager-200, policymanager-300. They each only have 1 partition to ensure all messages are sorted by the time kafka receives them. My writer will randomly pick one of those topics and issue a message consisting of a number that increments solely for that topic. When it's done running, I expect the queues to look something like this (topic names shortened for legibility):
100: 1 2 3 4 5 6 7 8 9 10 11
200: 1 2 3 4 5 6 7
300: 1 2 3 4 5 6 7 8 9 10 11 12
So far so good. I'm trying to configure things so that any number of consumers can be spun up and consume these messages in order. By "in-order" I mean that no consumer should get message 2 for topic 100 until message 1 is COMPLETED (not just started). If message 1 for topic 100 is being worked on, consumers are free to consume from other topics that currently don't have a message being processed. If a message of a topic has been sent to a consumer, that entire topic should become "locked" until either a timeout assumes that the consumer failed or the consumer commits the message, then the topic is "unlocked" to have it's next message made available to be consumed.
My reader:
package main
import (
"fmt"
"time"
"github.com/confluentinc/confluent-kafka-go/kafka"
)
func main() {
count := 2
for i := 0; i < count; i++ {
go consumer(i + 1)
}
fmt.Println("cosuming...")
// hold this thread open indefinitely
select {}
}
func consumer(id int) {
c, err := kafka.NewConsumer(&kafka.ConfigMap{
"bootstrap.servers": "localhost",
"group.id": "0", // strconv.Itoa(id),
"enable.auto.commit": "false",
})
if err != nil {
panic(err)
}
c.SubscribeTopics([]string{`^policymanager-.+$`}, nil)
for {
msg, err := c.ReadMessage(-1)
if err != nil {
panic(err)
}
fmt.Printf("%d) Message on %s: %s\n", id, msg.TopicPartition, string(msg.Value))
time.Sleep(time.Second)
_, err = c.CommitMessage(msg)
if err != nil {
fmt.Printf("ERROR commiting: %+v\n", err)
}
}
}
From my current understanding, the way I'm likely to achieve this is by setting up my consumer properly. I've tried many different variations of this program. I've tried having all my goroutines share the same consumer. I've tried using a different group.id
for each goroutine. None of these was the right configuration to get the behavior I'm after.
What the posted code does is empty out one topic at a time. Despite having multiple goroutines, the process will read all of 100 then move to 200 then 300 and only one goroutine will actually do all the reading. When I let each goroutine have a different group.id
then messages get read by multiple goroutines which I would like to prevent.
My example consumer is simply breaking things up with goroutines but when I begin working this project into my use case at work, I'll need this to work across multiple kubernetes instances that won't be talking to each other so using anything that interacts between goroutines won't work as soon as there are 2 instances on 2 kubes. That's why I'm hoping to make kafka do the gatekeeping I want.