I have some dotnet core microservices running in my kubernetes cluster (1.19.1), they are all running the istio sidecar proxy (1.9.1), and I am seeing some flaky connection behavior when making calls to the microservice which connects to the external SQL cluster. If I look at the sidecar logs I can see this when the connection failure happens:
istio sidecar log:
[2021-05-26T15:00:04.585Z] "- - -" 0 UF,URX - - "-" 0 0 9909 - "-" "-" "-" "-" "11.11.11.11:1433" PassthroughCluster - 11.11.11.11:1433 100.96.13.10:51662 - -
[2021-05-26T15:00:04.585Z] "- - -" 0 UF,URX - - "-" 0 0 9910 - "-" "-" "-" "-" "22.22.22.22:1433" PassthroughCluster - 22.22.22.22:1433 100.96.13.10:59498 - -
[2021-05-26T15:00:04.491Z] "- - -" 0 UF,URX - - "-" 0 0 10003 - "-" "-" "-" "-" "22.22.22.22:1433" PassthroughCluster - 22.22.22.22:1433 100.96.13.10:59484 - -
[2021-05-26T15:00:04.491Z] "- - -" 0 UF,URX - - "-" 0 0 10003 - "-" "-" "-" "-" "33.33.33.33:1433" PassthroughCluster - 33.33.33.33:1433 100.96.13.10:51648 - -
[2021-05-26T15:00:04.491Z] "- - -" 0 UF,URX - - "-" 0 0 10003 - "-" "-" "-" "-" "44.44.44.44:1433" PassthroughCluster - 44.44.44.44:1433 100.96.13.10:58482 - -
[2021-05-26T15:00:04.585Z] "- - -" 0 UF,URX - - "-" 0 0 10001 - "-" "-" "-" "-" "44.44.44.44:1433" PassthroughCluster - 44.44.44.44:1433 100.96.13.10:58496 - -
app log exception:
Unhandled exception: A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 35 - An internal exception was caught)
System.Data.SqlClient.SqlException (0x80131904): A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 35 - An internal exception was caught)
Note on the SQL cluster: in the app config we are using a DNS name for the availability group listener e.g. ag_listener.mydomain.com to point to the HA SQL cluster.
This is all working in our nonprod with no issues, we are also running istio there, though we are running only a single sql instance in nonprod.
Currently, I made sure to set the outboundTrafficPolicy to ALLOW_ANY but I am still seeing this flaky connection behavior. It doesn't happen all the time but it's just highly inconsistent. It's been a real pain for my team trying to resolve this. Is there a proper method on istio for handling connections to a mssql db cluster with multiple IPs? thank you
addtl note: I have tried the following ServiceEntry without any luck:
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: prod-sql-service-entry
spec:
addresses:
- 11.11.11.11/32
- 22.22.22.22/32
- 33.33.33.33/32
- 44.44.44.44/32
hosts:
- '*.mydomain.com'
location: MESH_EXTERNAL
ports:
- name: tcp
number: 1433
protocol: TCP