0

I was trying to upload a 10GB CSV file into WSO2 ML, but I could not do it, it gave me errors, I followed this link to change the size limit of my dataset in WSO2 ML(https://docs.wso2.com/display/ML100/FAQ#FAQ-Isthereafilesizelimittomydataset?Isthereafilesizelimittomydataset?)

I am running wso2 ML in a PC with the following characteristics: - 50GB RAM - 8 Cores

Thanks

Community
  • 1
  • 1

2 Answers2

0

When it comes to uploading datasets into WSO2 Machine Learner, we have given three options.

  1. Uploading files from your local file system. As you have mentioned, maximum uploading limit is kept to 100MB and you can increase the limit by setting -Dog.apache.cxf.io.CachedOutputStream.Threshold option your wso2server.dat file. We have tested this feature with a 1GB file. However, for large files, we don't recommend this option. The main use case of this functionality is to allow users to quickly try out some machine learning algorithm with small datasets.

Since you are working with a large dataset we would like to recommend following two approaches for uploading your dataset into WSO2 ML server.

  1. Upload data using Hadoop file system (HDFS). We have given a detailed description on how to use HDFS files in WSO2 ML in our documentation [1].

  2. If you have up and running WSO2 DAS instance, by integrating WSO2 ML with WSO2 DAS you can easily point out a DAS table as your source type in the WSO2 ML's "Create Dataset" wizard. For more details on integrating WSO2 ML with WSO2 DAS please refer [2].

If you need more help regarding this issue please let me know.

[1]. https://docs.wso2.com/display/ML100/HDFS+Support

[2]. https://docs.wso2.com/display/ML110/Integration+with+WSO2+Data+Analytics+Server

Upul Bandara
  • 5,973
  • 4
  • 37
  • 60
  • Thanks Upul, What is the maximum Data set size using DAS that you have tried? – Yandy Perez Ramos May 07 '16 at 01:27
  • For your info, if you happen to use HDP (Hortonworks) as part of your HDFS solution, you may need to use the NameNode port of 8020 via IPC in this case, i.e. hdfs://hostname:8020/samples/data/wdbcSample.csv. Not sure though what is the maximum data file limit with this HDFS approach to create a dataset onto WSO2 ML as I am still afraid to crush my WSO2 ML server if the dataset to be created is larger than 1 GB or 10 GB. Any thoughts on this limit with WSO2 ML capacity? – john smith Jul 28 '16 at 11:18
  • Hello … how can I load my data files onto my local WSO2 DAS data table first, before starting to create a dataset from DAS if I am using the embedded Spark server bundled with my WSO2 ML installation? Please help. – john smith Jul 28 '16 at 18:49
0

For those who want to use HDP (Hortonworks) as part of your HDFS solution to load a large sized dataset for WSO2 ML using the NameNode port of 8020 via IPC, i.e. hdfs://hostname:8020/samples/data/wdbcSample.csv, you may also need to ingest such a data file onto HDFS in the first place using the following Java client:

public static void main(String[] args) throws Exception {

    Configuration configuration = new Configuration();

    FileSystem hdfs = FileSystem.get(new URI("hdfs://hostname:8020"), configuration);
    Path dstPath = new Path("hdfs://hostname:8020/samples/data/wdbcSample.csv");

    if (hdfs.exists(dstPath)) {
        hdfs.delete(dstPath, true);
    } else {
        System.out.println("No such destination ...");
    }
    Path srcPath = new Path("wdbcSample.csv"); // a local file path on the client side

    try {
        hdfs.copyFromLocalFile(srcPath, dstPath);
        System.out.println("Done successfully ...");
    } catch (Exception ex) {
        ex.printStackTrace();
    } finally {
        hdfs.close();
    }
}
john smith
  • 536
  • 4
  • 11