0

I am using the Google Natural Language Content Classification API.
I am authenticating through a service account .json file in a directory with the path exposed in the GOOGLE_APPLICATION_CREDENTIALS environment variable.

There is no issue when I am running my classification script as 1 instance.
However, when I run my classification script in parallel (4,6,8,10 Docker containers running in 1 machine), I will get the below error occasionally:
[Errno 24] Too many open files: '/PATH/TO/MY-JSON_KEY.json'

I have read related issues which suggest to increase ulimit:

Which seems like more of a way to sidestep the underlying problem.

It seems like the Google library API call might be opening the account credential file but not closing it?

UPDATE
this is a longer error message that I managed to retrieve:

google.auth.exceptions.TransportError: HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 24] Too many open files'))

cryanbhu
  • 4,780
  • 6
  • 29
  • 47
  • did you get any solution for this? – sid8491 Sep 16 '19 at 10:21
  • i had a workaround but not a solution. What i did was to run many containers concurrently that called the API so I could process the data faster. – cryanbhu Sep 29 '19 at 06:14
  • So each container on your 1 instance has its own json file? – Brendan Oct 21 '19 at 02:19
  • yes. each container has the `.json` file copied into it as part of the `Dockerfile` steps. And they are identical, i just start many of them by doing `docker run...` repeatedly – cryanbhu Dec 01 '19 at 12:14

2 Answers2

1

I tried del client and then called later gc.collect(). It worked for me :)

kiki
  • 11
  • 1
1

I think I ran into the same issue recently and the problem seems to be that the Natural Language API Client is being called several times and it looks up for that "/PATH/TO/MY-JSON_KEY.json" then it opens it until it reaches the limit, I imagine that dividing the workload in multiple containers at the same time made possible that the limit is not reached, but if workload grows, limit might be reached again even with multiple containers, so I'd suggest you check how many times the following line is being called.

client = language_v1.LanguageServiceClient()

After making sure this is called only once, I didn't run into any issue, for more info, here's my post, hope it helps mate.