1

I have Apache Flink job for parsing csv-files which works fine in from IntelliJ IDEA on Windows. But when I put my job (jar) in docker-container Apache Flink i have problems with permisson to file with class FileSource.forRecordStreamFormat(...). Inside the container i have file: /opt/flink/data/test2.csv. The permissions are ok (I can even changed file from my job). For fileName I used /opt/flink/data/test2.csv, //opt/flink/data/test2.csv, ///opt/flink/data/test2.csv.

Permissions:

# pwd
/opt/flink/data
# ls -ls
total 16088
 1204 -rwxrwxrwx 1 root root  1231979 Jan 24 15:54 test2.csv
14876 -rwxrwxrwx 1 root root 15231523 Jan 22 19:24 test3.csv
    8 -rwxrwxrwx 1 root root     6623 Jan 24 14:32 test_Home.xlsx

Docker-compose:

version: "2.2"
services:
  jobmanager:
    image: flink:1.16-java8
    ports:
      - "8081:8081"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
    volumes:
      - /c/Users/MGubina/Desktop/data:/opt/flink/data

  taskmanager:
    image: flink:1.16-java8
    depends_on:
      - jobmanager
    command: taskmanager
    scale: 1
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        taskmanager.numberOfTaskSlots: 2

Part of job code:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        CsvReaderFormat<Product> csvFormat = CsvReaderFormat.forPojo(Product.class);

        FileSource<Product> csvSource =
//                FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(file)).build(); // firsrt version
                FileSource.forRecordStreamFormat(csvFormat, new Path(fileName)).build(); // second version

        DataStream<Product> csvInputStream = env.fromSource(csvSource, WatermarkStrategy.noWatermarks(), "csv-source");
...

Logs with exception:

Caused by: java.io.FileNotFoundException: File file:/opt/flink/data/test2.csv does not exist or the user running Flink ('flink') has insufficient permissions to access it.
        at org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:106)
        at org.apache.flink.connector.file.src.impl.StreamFormatAdapter.openStream(StreamFormatAdapter.java:157)
        at org.apache.flink.connector.file.src.impl.StreamFormatAdapter.createReader(StreamFormatAdapter.java:70)
        at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.checkSplitOrStartNext(FileSourceSplitReader.java:112)
        at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:65)
        at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
        at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:142)

I tried to use different ways of getting Path, but no luck this way. As long as I have in exception File file:/opt/flink/data/test2.csv does not exist I think that problem might be that in local fyle system in Docker (Unix-like)is needed path like file:///.

What can I do? Maybe I miss something?

Maria
  • 11
  • 2

1 Answers1

0

The issue seems to be that You are deploying two separate containers taskmanager and jobmanager, but the file is only available on jobmanager and not on taskmanager. Can You try to add the correct mounts to the task manager too and try again ?

Dominik Wosiński
  • 3,769
  • 1
  • 8
  • 22