0

I am getting below error while running a gobblin job. My core-site.xml looks fine and it has the required value.

core-site.xml

<property>
  <name>fs.AbstractFileSystem.gs.impl</name>
  <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
  <description>The AbstractFileSystem for 'gs:' URIs.</description>
</property>

Error

org.apache.gobblin.runtime.ForkException: Fork branches [0] failed for task task_toGCPHIVE_1639057335724_14
<Fork 0>
java.lang.RuntimeException: Error creating writer
    at org.apache.gobblin.writer.PartitionedDataWriter$4.get(PartitionedDataWriter.java:214)
    at org.apache.gobblin.writer.PartitionedDataWriter$4.get(PartitionedDataWriter.java:207)
    at org.apache.gobblin.writer.CloseOnFlushWriterWrapper.<init>(CloseOnFlushWriterWrapper.java:73)
    at org.apache.gobblin.writer.PartitionedDataWriter.<init>(PartitionedDataWriter.java:206)
    at org.apache.gobblin.runtime.fork.Fork.buildWriter(Fork.java:562)
    at org.apache.gobblin.runtime.fork.Fork.buildWriterIfNotPresent(Fork.java:570)
    at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:516)
    at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103)
    at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86)
    at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:250)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: fs.AbstractFileSystem.gs.impl=null: No AbstractFileSystem configured for scheme: gs
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:160)

I am able to ran the GS comands in commandline without any issues. For Ex: hadoop fs -ls gs://<<bucketName>> produces the required output.

Any help would be appreciated .

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
1stenjoydmoment
  • 229
  • 3
  • 14

1 Answers1

1

There are 2 possible solutions, if using Scala, pyspark and/or SPARK involve messing with core-site.xml.

The first one is related to How to fix "No FileSystem for scheme: gs" in pyspark? No FileSystem for scheme: gs

and second one : No Filesystem for Scheme: gs" when running spark job locally No Filesystem for Scheme

Finally, it could also be an issue with the Cloud Storage connector, I suggest reviewing next document to make sure that your settings were applied correctly. Cloud Storage connector .

Alva Santi
  • 75
  • 5