My Spark Master needs to read a file in order. Here is what I am trying to avoid (in pseudocode):
if file-path starts with "hdfs://"
Read via HDFS API
else
Read via native FS API
I think the following would do the trick, letting Spark deal with distinguishing between local/HDFS:
JavaSparkContext sc = new JavaSparkContext(new SparkConf());
List<String> lines = sc.textFile(path).collect();
Is it safe to assume that lines
will be in order; i.e. that lines.get(0)
is the first line of the file, lines.get(1)
is the second line; etc?
If not, any suggestions on how to avoid explicitly checking FS type?