1

I am running WordCount program in Windows using Apache Beam via DirectRunner.I can see the output files getting created in a temp folder(under src/main/resources/).But the write to the output file is getting failed. Below is the code snippet:

p.apply("ReadMyFile", TextIO.read().from("src/main/resources/input.txt"))
                .apply(Regex.split(" "))
                .apply(Count.<String>perElement())
                .apply(ToString.elements())
                .apply(TextIO.write().to("src/main/resources/output.txt"));

Please let me know the format it expects for the output directory/file Thanks in advance

Following is the error : Adding Exception:Caused by: java.lang.IllegalStateException: Unable to find registrar for i at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447) at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111) at org.apache.beam.sdk.io.FileSystems.matchResources(FileSystems.java:174) at org.apache.beam.sdk.io.FileSystems.delete(FileSystems.java:321) at org.apache.beam.sdk.io.FileBasedSink$Writer.cleanup(FileBasedSink.java:905) at org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles.processElement(WriteFiles.java:376)

  • When saying that a program fails, please always include the compete printout of the error. Just knowing that your program didn't work is not enough to help you fix it. – jkff Sep 17 '17 at 01:11
  • Adding Exception:Caused by: java.lang.IllegalStateException: Unable to find registrar for i at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447) at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111) at org.apache.beam.sdk.io.FileSystems.matchResources(FileSystems.java:174) at org.apache.beam.sdk.io.FileSystems.delete(FileSystems.java:321) at org.apache.beam.sdk.io.FileBasedSink$Writer.cleanup(FileBasedSink.java:905) at org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles.processElement(WriteFiles.java:376) – Anuroopa George Sep 19 '17 at 19:34
  • I have updated the question with the print out of the error.Thanks – Anuroopa George Sep 19 '17 at 19:36

2 Answers2

0

Beam currently doesn't handle Windows paths very well. See associated JIRAs, e.g. this one. Perhaps try specifying the absolute path using file:// ?

jkff
  • 17,623
  • 5
  • 53
  • 85
  • Using file:// also didn't work.I was able to run successfully in Unix box.As you mentioned, this might be an issue with Windows – Anuroopa George Sep 21 '17 at 18:46
0

Summary: you can use the "/" character as a standin for the drive the process is running on, e.g. if your output file is located at

"C:/myFile"

write

TextIO.write().to("/myFile"));

Longer answer:

Even after the issue mentioned in jkff's answer (this one) was resolved, I could only make the way they specified work for input, not for output.

The javadoc in the LocalFileSystem class says

 * <p>Windows OS:
 *
 * <ul>
 *   <li>pom.xml
 *   <li>C:/Users/beam/Documents/pom.xml
 *   <li>C:\\Users\\beam\\Documents\\pom.xml
 *   <li>file:/C:/Users/beam/Documents/pom.xml
 *   <li>file:///C:/Users/beam/Documents/pom.xml
 * </ul>
 */

but none of these worked for the method

TextIO.write().to(String filenamePrefix))

However, using release version 2.12.0, I was able to write to a file on the same drive by using "/" as the root directory, i.e. instead of "C:/myDirectory/myFile", I used "/myDirectory/myFile". Of course, this way, you can only write to files on the same drive, but given that DirectRunner should only be used for testing, this might be good enough for many cases.

llatrbng
  • 43
  • 5