I am working on a desktop application using GCP Speech-to-Text API to perform streaming recognition. I'm using Python 3 and Google client libraries, google.cloud.speech
. I've started off the transcribe streaming infnite Python sample and built around those concepts, and everything works nicely. I'm now trying to get to the end user access question, as I'm currently using a service account file for development purposes, which I'm obviously not willing to distribute to all users.
Basically, how do I give access to my end users to the Speech-to-Text service for streaming recognition in the least intrusive way possible? I have no need for accessing storage or the like, as I don't access buckets contents and stream all audio directly. I actually don't need any user information, I only need GCP to process the STT requests and send me the responses.
I see 2 solutions that should work on paper, out of the three the documentation mentions (I leave out the service account file ones):
- API key
- OAuth2
API key
API keys sound like my dream option: it's simple, doesn't require user interaction past initial setup, I can manage such keys in GCP's console, and it should be able to grant access to what I need (as I effectively don't need any user info, an account is mostly irrelevant).
However, how to use an API key using the Google Speech client library totally eludes me. I can see a PUB/SUB Go example, but I can't find any mapping to Python. I'm not even 100% sure it can work, as the Go documentation for the option seems to note it only works for JSON-over-HTTP, and I believe the client library for Speech-to-Text is using gRPC.
Yet, at least with a JSON non-streaming recognize request, I can use such an API key, and successfully did so manually using cURL on the command-line. So I still have a little bit of hope, in case the gRPC restriction either isn't true or doesn't concern my use case.
This part's question would summarize as: "how do I specify an API key using the Python SpeechClient
?".
OAuth2
This sounds like my second-best option, as it uselessly asks the user for authentication while I don't actually need any personal data. Yet, I still have serious issues I'm struggling to overcome:
- How do I reliably store info that prevent me from having to re-ask the user for authorization every single time the app runs?
google_auth_oauthlib
'sInstalledAppFlow
doesn't seem to provide such feature, so I'm rolling my own based ongoogle.oauth2.credentials.Credentials.from_authorized_user_file()
after having saved them with theto_json()
the first time I obtained them withInstalledAppFlow
. I'm however confident this will not last, and I'm really not sure how to check whether the credentials are still good before I fail to use an API with them (e.g. I can't seem to be able to rely onCredentials.valid
before they actually get used). - There seem to be no specific Speech-to-Text scopes, and the required one is way broader than what I need, leading to an overly complex and frightening authorization request. And no, without this scope I cannot access the Speech-to-Text API, I tried :)
Summary
To summarize: what is my best and least-intrusive option to provide credentials to be able to use GCP Speech-to-Text in my desktop application?