For those who want to use HDP (Hortonworks) as part of your HDFS solution to load a large sized dataset for WSO2 ML using the NameNode port of 8020 via IPC, i.e. hdfs://hostname:8020/samples/data/wdbcSample.csv, you may also need to ingest such a data file onto HDFS in the first place using the following Java client:
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
FileSystem hdfs = FileSystem.get(new URI("hdfs://hostname:8020"), configuration);
Path dstPath = new Path("hdfs://hostname:8020/samples/data/wdbcSample.csv");
if (hdfs.exists(dstPath)) {
hdfs.delete(dstPath, true);
} else {
System.out.println("No such destination ...");
}
Path srcPath = new Path("wdbcSample.csv"); // a local file path on the client side
try {
hdfs.copyFromLocalFile(srcPath, dstPath);
System.out.println("Done successfully ...");
} catch (Exception ex) {
ex.printStackTrace();
} finally {
hdfs.close();
}
}