2

I am trying to ingest 1 Million FHIR JSON Files (each file in bytes size) in FHIR Store of google healthcare dataset. It is taking so much time to ingest (more than an hour). Is there any way to optimize the speed of healthcare API.

Note : I want to Ingest, de-identify and export to bigquery as well. so the entire process is taking more than 3 hours of time.

Thanks in advance

code tutorial
  • 554
  • 1
  • 5
  • 17
  • Your question has no details such as what are you are ingesting, how are you ingesting, etc. Show your code, show benchmarks, etc. Why do you think it is slow (this is a very vague term)? – John Hanley Mar 01 '20 at 14:53
  • Hi @JohnHanley I tried to implement the exercise https://codelabs.developers.google.com/codelabs/fhir-to-bq/index.html?index=..%2F..index#0. I have huge FHIR data from external source(in the form of 1Million JSONs) and i tried to use gcloud command to import to FHIR store and it is take longer amount of time. – code tutorial Mar 02 '20 at 12:53
  • The Google Cloud Healthcare team is looking into this - the codelab example is only supposed to take a few minutes. Are you using us-central1? – Paul Church Mar 02 '20 at 15:57
  • yes! I am using us-central1 only. – code tutorial Mar 02 '20 at 17:54
  • @PaulChurch, How can proceed with this scenario ? Thanks in advance! – code tutorial Mar 04 '20 at 18:17
  • @codetutorial are you still having issues with the codelabs? – Daniel Ocando Mar 05 '20 at 10:32
  • @DanielOcando, Codelabs has small amount of data and it is working absolutely fine. The problem is with FHIR store ingestion with larger amount of FHIR data. – code tutorial Mar 05 '20 at 12:26

1 Answers1

0

Some performance tips for bulk FHIR import in the Google Cloud Healthcare API:

  • Make sure your input GCS bucket is in the same region as the healthcare dataset. Cross-region imports will be slower.
  • Check your project quota. The relevant quota for bulk imports is "FHIR storage ingress in bytes per minute". You can request a quota increase if this becomes the limiting factor.
  • Performance may vary depending on the overall load in the region you are using. us-central1 is a very popular region because it's referenced in the codelab; you might achieve higher throughput elsewhere (see https://cloud.google.com/healthcare/docs/concepts/regions for available regions).
Ajay Kharade
  • 1,469
  • 1
  • 17
  • 31
Paul Church
  • 244
  • 1
  • 2
  • I will follow these steps. Thanks for the information. Just wondering if there are any ways to auto scale this process then that would be really helpful. – code tutorial Mar 06 '20 at 12:39
  • The service handles horizontal scaling of all parts of the backend automatically. If we can make it faster for you, we will. :-) – Paul Church Mar 06 '20 at 19:31
  • @PaulChurch Not sure how to roll my eyes more than I already am. Can one query or search for executeBundle info? I am interested in querying for all records change by an executeBundle command. Is there some global version I can query for it. – debovis Sep 21 '21 at 13:45