2

I am attempting to specify my GCS temp location by passing it as an option in the command-line as shown below.

java -jar pipeline-0.0.1-SNAPSHOT.jar --runner=DataflowRunner --project=<my_project> --tempLocation=gs://<my_bucket>/<my_folder>

However, I continue to receive a syntax error:

java.nio.file.InvalidPathException: Illegal char <:> at index 2: gs://<my_bucket>/<my_folder>

I'm referring to the following documentation:

https://cloud.google.com/dataflow/pipelines/specifying-exec-params

I specify that I am taking the argument from the command-line as such:

DataflowPipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(DataflowPipelineOptions.class);

Updated with the full stack trace as asked in questions below:

org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.nio.file.InvalidPathException: Illegal char <:> at index 2: gs://pipeline-az/staging
        at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342)
        at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:312)
        at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:206)
        at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
        at com.autozone.google.pipeline.PipelinePeople.main(PipelinePeople.java:97)
Caused by: java.nio.file.InvalidPathException: Illegal char <:> at index 2: gs://pipeline-az/staging
        at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
        at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
        at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
        at sun.nio.fs.WindowsPath.parse(Unknown Source)
        at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
        at java.nio.file.Paths.get(Unknown Source)
        at org.apache.beam.sdk.io.LocalFileSystem.matchNewResource(LocalFileSystem.java:196)
        at org.apache.beam.sdk.io.LocalFileSystem.matchNewResource(LocalFileSystem.java:78)
        at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:544)
        at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:325)
        at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$4.getTempFilePrefix(BatchLoads.java:381)

My Pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.hendpro.google</groupId>
  <artifactId>pipeline</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>pipeline</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

<build>
  <plugins>
    <plugin>
      <!-- Build an executable JAR -->
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-jar-plugin</artifactId>
      <version>3.0.2</version>
      <configuration>
        <archive>
          <manifest>
            <addClasspath>true</addClasspath>
            <classpathPrefix>lib/</classpathPrefix>
            <mainClass>com.hendpro.google.pipeline.PipelinePeople</mainClass>
          </manifest>
        </archive>
      </configuration>
    </plugin>
    <plugin>
      <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
    </plugin>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
</build>

  <dependencies>
  <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>
   <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-core</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-io-jdbc</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-direct-java</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>42.1.1</version>
    </dependency>
  </dependencies>
</project>

Update:

I've also tried both direct runner as well as the dataflow runner and have tried with and without the following:

.as(DataflowPipelineOptions.class);
.as(DirectOptions.class);

Regardless of runner choice or declaration the error persists.

Adding Shaded jar list:

[INFO] --- maven-shade-plugin:3.1.0:shade (default) @ pipeline ---
[INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-core:jar:2.3.0 in the shaded jar.
[INFO] Including com.google.code.findbugs:jsr305:jar:3.0.1 in the shaded jar.
[INFO] Including com.github.stephenc.findbugs:findbugs-annotations:jar:1.3.9-1 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-core:jar:2.8.9 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.8.9 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.8.9 in the shaded jar.
[INFO] Including org.slf4j:slf4j-api:jar:1.7.25 in the shaded jar.
[INFO] Including org.apache.avro:avro:jar:1.8.2 in the shaded jar.
[INFO] Including org.codehaus.jackson:jackson-core-asl:jar:1.9.13 in the shaded jar.
[INFO] Including org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 in the shaded jar.
[INFO] Including com.thoughtworks.paranamer:paranamer:jar:2.7 in the shaded jar.
[INFO] Including org.apache.commons:commons-compress:jar:1.8.1 in the shaded jar.
[INFO] Including org.tukaani:xz:jar:1.5 in the shaded jar.
[INFO] Including org.xerial.snappy:snappy-java:jar:1.1.4 in the shaded jar.
[INFO] Including joda-time:joda-time:jar:2.4 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-io-jdbc:jar:2.3.0 in the shaded jar.
[INFO] Including org.apache.commons:commons-dbcp2:jar:2.1.1 in the shaded jar.
[INFO] Including org.apache.commons:commons-pool2:jar:2.4.2 in the shaded jar.
[INFO] Including commons-logging:commons-logging:jar:1.2 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:2.3.0 in the shaded jar.
[INFO] Including com.google.cloud.bigdataoss:gcsio:jar:1.4.5 in the shaded jar.
[INFO] Including com.google.apis:google-api-services-cloudresourcemanager:jar:v1-rev6-1.22.0 in the shaded jar.
[INFO] Including com.google.apis:google-api-services-storage:jar:v1-rev71-1.22.0 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-extensions-protobuf:jar:2.3.0 in the shaded jar.
[INFO] Including io.grpc:grpc-core:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.errorprone:error_prone_annotations:jar:2.0.11 in the shaded jar.
[INFO] Including io.grpc:grpc-context:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.instrumentation:instrumentation-api:jar:0.3.0 in the shaded jar.
[INFO] Including com.google.apis:google-api-services-bigquery:jar:v2-rev355-1.22.0 in the shaded jar.
[INFO] Including com.google.api:gax-grpc:jar:0.20.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.api:api-common:jar:1.1.0 in the shaded jar.
[INFO] Including com.google.auto.value:auto-value:jar:1.2 in the shaded jar.
[INFO] Including com.google.api:gax:jar:1.3.1 in the shaded jar.
[INFO] Including org.threeten:threetenbp:jar:1.3.3 in the shaded jar.
[INFO] Including com.google.cloud:google-cloud-core-grpc:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java-util:jar:3.2.0 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:2.7 in the shaded jar.
[INFO] Including com.google.apis:google-api-services-pubsub:jar:v1-rev10-1.22.0 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-cloud-pubsub-v1:jar:0.1.18 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-cloud-pubsub-v1:jar:0.1.18 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-iam-v1:jar:0.1.18 in the shaded jar.
[INFO] Including com.google.cloud.bigdataoss:util:jar:1.4.5 in the shaded jar.
[INFO] Including com.google.api-client:google-api-client-java6:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.api-client:google-api-client-jackson2:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.oauth-client:google-oauth-client:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.oauth-client:google-oauth-client-java6:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 in the shaded jar.
[INFO] Including com.google.http-client:google-http-client-protobuf:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.http-client:google-http-client-jackson:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-common-protos:jar:0.1.0 in the shaded jar.
[INFO] Including io.grpc:grpc-auth:jar:1.2.0 in the shaded jar.
[INFO] Including io.grpc:grpc-netty:jar:1.2.0 in the shaded jar.
[INFO] Including io.netty:netty-codec-http2:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-http:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-handler-proxy:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-socks:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-handler:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-buffer:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-common:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-transport:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-resolver:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-codec:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.grpc:grpc-stub:jar:1.2.0 in the shaded jar.
[INFO] Including io.grpc:grpc-all:jar:1.2.0 in the shaded jar.
[INFO] Including io.grpc:grpc-okhttp:jar:1.2.0 in the shaded jar.
[INFO] Including com.squareup.okhttp:okhttp:jar:2.5.0 in the shaded jar.
[INFO] Including com.squareup.okio:okio:jar:1.6.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf-lite:jar:1.2.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf-nano:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.protobuf.nano:protobuf-javanano:jar:3.0.0-alpha-5 in the shaded jar.
[INFO] Including com.google.cloud:google-cloud-core:jar:1.0.2 in the shaded jar.
[INFO] Including org.json:json:jar:20160810 in the shaded jar.
[INFO] Including com.google.cloud:google-cloud-spanner:jar:0.20.0-beta in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 in the shaded jar.
[INFO] Including junit:junit:jar:4.12 in the shaded jar.
[INFO] Including org.hamcrest:hamcrest-core:jar:1.3 in the shaded jar.
[INFO] Including com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 in the shaded jar.
[INFO] Including com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 in the shaded jar.
[INFO] Including com.google.auth:google-auth-library-appengine:jar:0.7.0 in the shaded jar.
[INFO] Including io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 in the shaded jar.
[INFO] Including io.opencensus:opencensus-api:jar:0.7.0 in the shaded jar.
[INFO] Including io.dropwizard.metrics:metrics-core:jar:3.1.2 in the shaded jar.
[INFO] Including com.google.api-client:google-api-client:jar:1.22.0 in the shaded jar.
[INFO] Including com.google.http-client:google-http-client:jar:1.22.0 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpclient:jar:4.0.1 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpcore:jar:4.0.1 in the shaded jar.
[INFO] Including commons-codec:commons-codec:jar:1.3 in the shaded jar.
[INFO] Including com.google.http-client:google-http-client-jackson2:jar:1.22.0 in the shaded jar.
[INFO] Including com.google.auth:google-auth-library-credentials:jar:0.7.1 in the shaded jar.
[INFO] Including com.google.auth:google-auth-library-oauth2-http:jar:0.7.1 in the shaded jar.
[INFO] Including com.google.guava:guava:jar:20.0 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java:jar:3.2.0 in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:1.1.33.Fork26 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1:jar:0.1.9 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-common-protos:jar:0.1.9 in the shaded jar.
[INFO] Including org.apache.beam:beam-runners-direct-java:jar:2.3.0 in the shaded jar.
[INFO] Including org.apache.beam:beam-runners-local-java-core:jar:2.3.0 in the shaded jar.
[INFO] Including org.postgresql:postgresql:jar:42.1.1 in the shaded jar.
[WARNING] grpc-google-common-protos-0.1.0.jar, proto-google-common-protos-0.1.9.jar, proto-google-longrunning-v1-0.1.11.jar define 28 overlapping classes:
[WARNING]   - com.google.longrunning.ListOperationsRequestOrBuilder
[WARNING]   - com.google.longrunning.ListOperationsRequest$Builder
[WARNING]   - com.google.longrunning.OperationsProto$1
[WARNING]   - com.google.longrunning.OperationOrBuilder
[WARNING]   - com.google.longrunning.ListOperationsResponseOrBuilder
[WARNING]   - com.google.longrunning.DeleteOperationRequestOrBuilder
[WARNING]   - com.google.longrunning.DeleteOperationRequest$1
[WARNING]   - com.google.longrunning.CancelOperationRequest$1
[WARNING]   - com.google.longrunning.GetOperationRequest
[WARNING]   - com.google.longrunning.Operation$2
[WARNING]   - 18 more...
[WARNING] grpc-google-common-protos-0.1.0.jar, proto-google-common-protos-0.1.9.jar define 352 overlapping classes:
[WARNING]   - com.google.api.Logging
[WARNING]   - com.google.api.Usage$1
[WARNING]   - com.google.rpc.ResourceInfoOrBuilder
[WARNING]   - com.google.api.AuthProvider$1
[WARNING]   - com.google.api.ProjectProperties$Builder
[WARNING]   - com.google.api.DocumentationProto
[WARNING]   - com.google.type.TimeOfDayOrBuilder
[WARNING]   - com.google.api.MonitoringOrBuilder
[WARNING]   - com.google.api.Authentication$Builder
[WARNING]   - com.google.api.Monitoring
[WARNING]   - 342 more...
[WARNING] beam-sdks-java-core-2.3.0.jar, beam-sdks-java-extensions-google-cloud-platform-core-2.3.0.jar define 3 overlapping classes:
[WARNING]   - org.apache.beam.sdk.util.AutoValue_DoFnAndMainOutput
[WARNING]   - org.apache.beam.sdk.util.package-info
[WARNING]   - org.apache.beam.sdk.util.AutoValue_ReleaseInfo
[WARNING] grpc-google-common-protos-0.1.0.jar, grpc-google-longrunning-v1-0.1.11.jar define 7 overlapping classes:
[WARNING]   - com.google.longrunning.OperationsGrpc$OperationsStub
[WARNING]   - com.google.longrunning.OperationsGrpc$1
[WARNING]   - com.google.longrunning.OperationsGrpc$OperationsFutureStub
[WARNING]   - com.google.longrunning.OperationsGrpc$OperationsImplBase
[WARNING]   - com.google.longrunning.OperationsGrpc$OperationsBlockingStub
[WARNING]   - com.google.longrunning.OperationsGrpc
[WARNING]   - com.google.longrunning.OperationsGrpc$MethodHandlers
[WARNING] maven-shade-plugin has detected that some class files are
[WARNING] present in two or more JARs. When this happens, only one
[WARNING] single version of the class is copied to the uber jar.
[WARNING] Usually this is not harmful and you can skip these warnings,
[WARNING] otherwise try to manually exclude artifacts based on
[WARNING] mvn dependency:tree -Ddetail=true and the above output.
[WARNING] See http://maven.apache.org/plugins/maven-shade-plugin/
HendPro12
  • 1,094
  • 3
  • 17
  • 50
  • According to the error, it seems like it's expecting a local path instead of a GCS url. If you are using a 2.X SDK try if it works with `gcpTempLocation` instead. – Guillem Xercavins Mar 13 '18 at 08:13
  • @GuillemXercavins I've tried and it doesnt accept it. "Exception in thread "main" java.lang.IllegalArgumentException: Class interface org.apache.beam.sdk.options.PipelineOptions missing a property named 'gcpTempLocation'" – HendPro12 Mar 13 '18 at 12:00
  • Can you post more of the stack trace? Nothing in your invocation seems as though it would be immediate cause for concern – Thomas Groh Mar 13 '18 at 23:02
  • @ThomasGroh the stack trace has been added to the original post. – HendPro12 Mar 14 '18 at 00:42
  • any luck? I am facing same issue. From eclipse IDE job is getting launched successfully, but not using command line. :( – ajit deshmukh Mar 14 '18 at 06:25
  • @ajitdeshmukh No luck yet. Yes, the same for me. It works fine in Eclipse. – HendPro12 Mar 14 '18 at 13:54
  • @ajitdeshmukh This link may help you but it did not resolve anything for me as I only have the apache beam dependencies (including for the runner): https://stackoverflow.com/questions/48689089/error-while-staging-packages-when-a-dataflow-job-is-launched-from-a-fat-jar – HendPro12 Mar 14 '18 at 18:55
  • Does pipeline-0.0.1-SNAPSHOT.jar contain all dependencies, or just your code? Can you include your pom.xml? – jkff Mar 14 '18 at 19:01
  • @jkff Added pom.xml to the post. The jar is an uberjar containing all dependencies created by the maven-shade-plugin. – HendPro12 Mar 14 '18 at 19:04
  • Can you check if your generated jar contains this class? https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemRegistrar.java – jkff Mar 14 '18 at 19:46
  • @jkff Added the list that gets added. I do not see it included. – HendPro12 Mar 14 '18 at 19:57
  • Thanks. According to that list, I think it should be there. Can you please "jar -tf pipeline.jar | grep GcsFileSystem"? – jkff Mar 14 '18 at 20:58
  • @jkff org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.class org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemRegistrar.class – HendPro12 Mar 14 '18 at 21:00
  • Hmm I'm not sure what's going on but it's likely related to the configuration of your Maven plugins. I'm no Maven expert, but perhaps the bundled jar is omitting the AutoService configuration for GcsFileSystem? See https://github.com/google/auto/tree/master/service - does the jar contain something like FileSystemRegistrar in META-INF/services/ ? Does that file mention GcsFileSystem? – jkff Mar 14 '18 at 23:09
  • @jkff I see org.apache.beam.sdk.io.FileSystemRegistrar under META-INF\services. The only thing inside that file is the name of the file. – HendPro12 Mar 15 '18 at 13:53
  • 1
    I think https://stackoverflow.com/questions/42540485/how-to-stop-maven-shade-plugin-from-blocking-java-util-serviceloader-initializat provides a solution to your issue. – jkff Mar 15 '18 at 17:38
  • @jkff Please post the answer from your last comment as the answer and I will mark it as such. Adding ServicesResourceTransformer to my maven shade plugin resolved the issue. – HendPro12 Mar 19 '18 at 13:26

1 Answers1

2

As explained in this answer, when you use the Maven Shade Plugin in conjunction with ServiceLoader for dependency injection, you should specify ServicesResourceTransformer in your pom.xml file:

<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>

Even if the plugin is relocating classes, this will ensure that every service file under META-INF/services of your dependencies is merged, without the need to declare them all.

Note: just posting this as a community wiki answer for now but I'll gladly delete it if @jkff posts his comment as an answer instead. All credit to @Tunaki and @jkff.

Guillem Xercavins
  • 6,938
  • 1
  • 16
  • 35