Questions tagged [google-cloud-data-fusion]

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

This tag can be added to any questions related to using/troubleshooting Google Cloud Data Fusion.

Useful links:

445 questions
9
votes
1 answer

Can Google Data Fusion make the same data cleaning than DataPrep?

I want to run a machine learning model with some data. Before train the model with this data I need to process it, so I have been reading some ways to do it. First of all create a Dataflow pipeline to upload it to Bigquery or Google Cloud Storage,…
6
votes
3 answers

Can't connect Cloud Data Fusion with Google Cloud SQL for PostgreSQL

My goal is to read data from Cloud SQL Postgres to BigQuery via a Cloud Data Fusion pipeline. For this, I set up a Cloud Data Fusion instance and assigned the following two permissions to the service account: (see…
5
votes
1 answer

Stopping Cloud Data Fusion Instance

I have production pipelines which only runs for couple of hours using Google Data Fusion. I would like to stop the Data Fusion Instance and start it the next day. I don't see an option to stop the instance. Is there anyway we can stop the instance…
5
votes
2 answers

Google Cloud Data Fusion -- building pipeline from REST API endpoint source

Attempting to build a pipeline to read from a 3rd party REST API endpoint data source. I am using the HTTP (version 1.2.0) plugin found in the Hub. The response request URL is: https://api.example.io/v2/somedata?return_count=false A sample of…
5
votes
1 answer

Is it possible to schedule a job with Google Data Fusion and then delete the developer instance?

I'm evaluating Google Cloud Data Fusion to use for an internal project, and I want to be able to set up a Data Fusion instance, define and deploy a scheduled pipeline, then shut down the Data Fusion instance. However, when the instance is shut down,…
cbsalling
  • 53
  • 3
5
votes
1 answer

BigQuery - Cannot read and write in different locations: source: EU, destination: US

I've created a basic instance in europe-west1-b. I try to join data from 2 BigQuery tables and write the results back to BigQuery. I got this error : java.io.IOException: Cannot read and write in different locations: source: EU, destination: US The…
5
votes
1 answer

While running Data fusion pipeline to load csv file from GCS to BigQuery facing some issue regarding data-proc deprovisioning

I am using Data fusion to create a pipeline which will load CSV data from GCS to BigQuery. When i am doing the preview it's working fine. But when i am deploying the pipeline it's giving me below error. ERROR …
Mustaquim
  • 103
  • 5
4
votes
1 answer

Run a Data Fusion pipeline only when a file exist

I already have a working pipeline in Data Fusion that makes all ETL proccess but I need it to run only when it finds a file called SUCCESS.txt located in a Cloud Storage bucket. Is this even possible? On other platforms I used a file watcher…
4
votes
1 answer

GCP Data Fusion no discoverable foud error

I'm trying to use GCP Data Fusion Basic Edition with Private IP option, buth when I try to create a pipeline every action gives me this error No discoverable found for request POST…
4
votes
2 answers

How to permit Google Cloud Data Fusion to connect to an AWS RDS MySQL database?

I'm getting an error in configuring a database connection in a Google Cloud Data Fusion Pipeline. "Encountered SQL error while getting query schema: Communications link failure The last packet sent successfully to the server was 0 milliseconds…
crazy8
  • 308
  • 3
  • 16
4
votes
1 answer

How do I configure Cloud Data Fusion pipeline to run against existing Hadoop clusters

Cloud Data Fusion creates a new Dataproc cluster for every pipeline run. I already have a Dataproc cluster setup which runs 24x7 and I would like to use that cluster to run pipelines
Sree
  • 714
  • 4
  • 8
4
votes
2 answers

Permissions Issue with Google Cloud Data Fusion

I'm following the instructions in the Cloud Data Fusion sample tutorial and everything seems to work fine, until I try to run the pipeline right at the end. Cloud Data Fusion Service API permissions are set for the Google managed Service account as…
Helvick
  • 238
  • 2
  • 10
4
votes
2 answers

Data Fusion Provisioning of Dataproc Cluster Fails

I've created a simple pipeline which reads from a SQL Server table and writes to a BigQuery table. Then I configure it to use Spark and deploy and run. It starts by provisioning the dataproc cluster and I can see that it relatively quickly creates 3…
Bjoern
  • 433
  • 3
  • 16
4
votes
1 answer

PROVISION task failed in REQUESTING_CREATE state

I am new in GCP platform and trying to create a simple Data Fusion workflow to load a BigQuery table from a text file that resides in GCS bucket. The workflow has been deployed successfully. However, while running the workflow, it is failing in…
4
votes
1 answer

Google Cloud Data Fusion - Dataproc provisioning stopping abruptly without any error message

I have designed a simple pipeline to read a CSV file from Cloud Storage and write to a BigQuery Table. While running the pipeline, the operation stops abruptly without any error message in logs. Have already required Firewall rules. Please suggest…
Safiyur
  • 145
  • 3
  • 10
1
2 3
29 30