Regarding any high volume set on k8s. Be sure to tune your linux kernel (primarily related to the Linux Connection Tracker (Contrack) service. You will also expect to see non-zero tcp timeouts, retries, out of window acks, et al. Depending on which container networking implementation is used, there may be additional configuration changes required.
I will assume you are using containerd and NOT using docker networking (except obviously the container(s) within a pod)
The issue applies to ANY heavy IO pod: kafka, NiFi, mySQL, PostGreSQL, you name it.
The incident increases when:
- "high" volumes of cross pod (especially cross node) tcp connections occur
- additional errors if you have large (megabyte or larger) messages
Be aware of any other components using either the Pod or VM tcp stack (e.g. PVC software supporting NiFi persisted storage)