Jfrog Service is Going Down frequently

Question

We can see our Jfrog Service is going down frequently and we have configured Crontab which is bring the jfrog instantly.

But here the issue is it is impacting our jenkins builds, and there is not error in logs apart from router-service.log.

Below included the router-service log.

2022-10-13T18:53:50.522Z [jfrou] [ERROR] [03d6f1c55eb36f37] [external_topology.go:79       ] [main                ] [] - Failed fetching external topology from Access: Get "http://localhost:8040/access/api/v1/topology": context deadline exceeded
2022-10-13T18:53:55.093Z [jfrou] [ERROR] [34485554c5ada198] [local_topology.go:134         ] [main                ] [] - periodic send heartbeat failed for 175 consecutive times. Last error: failed sending heartbeat information to Access: failed closing Access grpc client: closing heartbeat client and waiting for response timed-out
2022-10-13T18:53:55.197Z [jfrou] [WARN ] [226f3a72430aaafe] [local_topology.go:274         ] [main                ] [] - Readiness test failed with the following error: "required node services are missing or unhealthy"
2022-10-13T18:54:00.199Z [jfrou] [ERROR] [226f3a72430aaafe] [local_topology.go:134         ] [main                ] [] - periodic send heartbeat failed for 176 consecutive times. Last error: failed sending heartbeat information to Access: failed closing Access grpc client: closing heartbeat client and waiting for response timed-out
2022-10-13T18:54:00.302Z [jfrou] [WARN ] [4081475d82aceec3] [local_topology.go:274         ] [main                ] [] - Readiness test failed with the following error: "required node services are missing or unhealthy"
2022-10-13T18:54:05.304Z [jfrou] [ERROR] [4081475d82aceec3] [local_topology.go:134         ] [main                ] [] - periodic send heartbeat failed for 177 consecutive times. Last error: failed sending heartbeat information to Access: failed closing Access grpc client: closing heartbeat client and waiting for response timed-out
2022-10-13T18:54:05.407Z [jfrou] [WARN ] [492a2b85447ed5b2] [local_topology.go:274         ] [main                ] [] - Readiness test failed with the following error: "required node services are missing or unhealthy"
2022-10-13T18:54:05.623Z [jfrou] [ERROR] [32aa341a8a54b5e8] [external_topology.go:79       ] [main                ] [] - Failed fetching external topology from Access: Get "http://localhost:8040/access/api/v1/topology": context deadline exceeded
2022-10-13T18:54:10.409Z [jfrou] [ERROR] [492a2b85447ed5b2] [local_topology.go:134         ] [main                ] [] - periodic send heartbeat failed for 178 consecutive times. Last error: failed sending heartbeat information to Access: failed closing Access grpc client: closing heartbeat client and waiting for response timed-out
2022-10-13T18:54:10.513Z [jfrou] [WARN ] [59da1d3d86839010] [local_topology.go:274         ] [main                ] [] - Readiness test failed with the following error: "required node services are missing or unhealthy"
2022-10-13T18:54:15.514Z [jfrou] [ERROR] [59da1d3d86839010] [local_topology.go:134         ] [main                ] [] - periodic send heartbeat failed for 179 consecutive times. Last error: failed sending heartbeat information to Access: failed closing Access grpc client: closing heartbeat client and waiting for response timed-out
2022-10-13T18:54:15.617Z [jfrou] [WARN ] [034ba642a0140355] [local_topology.go:274         ] [main                ] [] - Readiness test failed with the following error: "required node services are missing or unhealthy"
2022-10-13T18:54:20.619Z [jfrou] [ERROR] [034ba642a0140355] [local_topology.go:134         ] [main                ] [] - periodic send heartbeat failed for 180 consecutive times. Last error: failed sending heartbeat information to Access: failed closing Access grpc client: closing heartbeat client and waiting for response timed-out

Can anyone please let us know the what could be the reason or possibilities for this issue?

Jyothi Prasad · Answer 1 · 2022-10-17T08:10:34.023

Based on the error snippet the issue seems to be with Access Service which is not accepting any connections which are coming from the router and the requests are getting timed out.

First, check the logs in the access-service.log which will help with proper errors.

This could be because the Access service is overloaded and the number of connections are exhausted. One probable reason could be checking any expired/incorrect passwords configured for any CI user which might create a huge load on the access service with frequent retries.

score 1 · Answer 2 · answered Mar 09 '23 at 13:43

We were seeing exactly the same issue. Artifactory 7.49.3. A few seconds later after the context deadline exceeded errors were logged we had following (additional) errors in access-service.log (as suggested by Jyothi Prasad's answer to check this log file)

2023-02-28T20:08:50.321Z [jfac ] [WARN ] [c.z.h.p.HikariPool:787] [P Unique housekeeper] - HikariCP Unique - Thread starvation or clock leap detected (housekeeper delta=1m39s92ms958µs747ns).
2023-02-28T20:08:50.317Z [jfac ] [WARN ][c.z.h.p.HikariPool:787] [iCP Main housekeeper] - HikariCP Main - Thread starvation or clock leap detected (housekeeper delta=1m39s89ms837µs323ns).

Which means you have too little CPU power. We've increased our CPU cores from 2 to 8 and haven't had any issue since.

Jfrog Service is Going Down frequently

2 Answers2