Read CSV file in Spark kept in local using Java in cluster mode

Question

I am trying to read a CSV file kept in my local filesystem in UNIX, while running it in cluster mode it's not able to find the CSV file.

In local mode, it can read both HDFS and file:/// files. However, in cluster mode, it can only read HDFS file.

Is there any suitable way to read without copying it into HDFS?

score 0 · Answer 1 · answered Oct 14 '21 at 13:10

0

Remember that the executor needs to be able to access the file, so you have to take a stand from the executor nodes. As you mention HDFS, it means that the executor nodes must have access to your HDFS cluster.

If you want the Spark cluster to access a local file, consider NFS/SMB etc. However, something will end up copying the data.

I can update my answer if you add more details on your architecture.

answered Oct 14 '21 at 13:10

jgp

2,069
1
21
40

So when i try it using my eclipse its working fine for both hdfs and local system,I just mention the path (for local I use file:///C:\abc\my.txt) for hdfs I use /bin/app/my.txt,both worlds, But when I try it using yarn client mode or cluser mode both don't work – mukesh dewangan Oct 14 '21 at 15:29
Since you have answered one of my question:please take a look at my another link.I have been waiting for that resolution since long:https://stackoverflow.com/questions/69016167/how-to-avoid-showing-some-secret-values-in-sparkui – mukesh dewangan Oct 14 '21 at 15:31
You can accept the answer if it helps ;) or give me more info to help by editing your question. On the second one, I looked at it already and had b no clue then... – jgp Oct 15 '21 at 14:01
in the first one itself,Can you sugest, keeping the csv file in local help neither on cluster mode nor on client mode.The csv location is not a NAS path so that ,not all the nodes can find the csv file.Is there any other solution except keeing it in hdfs path – mukesh dewangan Oct 18 '21 at 06:44

Read CSV file in Spark kept in local using Java in cluster mode

1 Answers1