i'm using Spark 2.4.8
with the gcs-connector
from com.google.cloud.bigdataoss
in version hadoop2-2.1.8
. For development i'm using a Compute Engine VM with my IDE. I try to consume some CSV files from a GCS bucket natively with the Spark .csv(...).load(...)
functionality. Some files are loaded successfully, but some are not. Then in the Spark UI i can see that the load job runs forever until a timeout fires.
But the weird thing is, that when i run the same application packaged to a Fat-JAR in Dataproc cluster, all the same files can be consumed successfully.
What i am doing wrong?