1

I have a java function which operate on huge amount of data may be 500MB. I have to pass this 500MB of data to a java function and return the data after processing from the java function.

My is in the tabular form as follows

col1  col2 col3 col4 col5 col6
 3     5    2     5    1   6
 7     5    6     8    3   8
 5     3    7     9    8   1

I have few ideas in mind but don't know exactly which one is efficient and how to implement like which java api I need for those.

  1. Convert the data into java objects (each row one object of same class). Then pass the objects as an array to java function.
  2. Prepare XML doc from the tabular data and pass XML doc to java function. inside java function extract objects from XML document.
  3. Save the tabular data into file and input the file as argument to java function.

These ideas I have in my mind, if some one can provide pros and cons of above 3 methods or suggest some new method it will be grateful to me.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Surjya Narayana Padhi
  • 7,741
  • 25
  • 81
  • 130
  • In order to give a good answer much more detail is needed. Where do you get the data from? What is the function supposed to do with it? – Henry Jun 25 '14 at 06:30
  • I think you have to tests your ideas. It depends the processing actions you want to do. – Nicolas Henrard Jun 25 '14 at 06:31
  • Just to correct you, there are NO FUNCTIONS in Java, it has only **Methods**. http://stackoverflow.com/a/16335031/1055241 – gprathour Jun 25 '14 at 06:42
  • @GPRathour technically and historically you are correct (if you speak about the Java -language-), although you could argue that static methods serve the purpose of a function. In any case terminology is a big conflicting mess nowadays... – Gimby Jun 25 '14 at 08:22
  • @GPRathour Yeah... well, a Java static method is pretty much a function. Let's not get into a nomenclature fight. – Tarik Jun 25 '14 at 08:48
  • Could you explain what you are trying to do so as to have the answers adapted to the context of your application. – Tarik Jun 25 '14 at 08:49

4 Answers4

1

Passing an array will just pass a reference that will not involve any data copying and as such is as efficient as it can be. Any modification to the array will be done on the referenced array. Nothing needs to be returned.

Tarik
  • 10,810
  • 2
  • 26
  • 40
  • thanks a lot for the suggestion. Could you please let me know if this method is applicable,if my function to call is on another server? – Surjya Narayana Padhi Jun 25 '14 at 06:30
  • 2
    @SurjyaNarayanaPadhi if you need to pass 500 MB to another server for a method call, I would seriously recommend having a hard, long look at your architecture. – Tassos Bassoukos Jun 25 '14 at 06:33
  • No, you do not want to pass 500MB via some RPC mechanism. – Tarik Jun 25 '14 at 06:37
  • @TassosBassoukos this is not necessarily an architectural problem. There are a lot of applications that need to transfer huge amounts of data as part of their requirements (e.g. video streaming services). I would suggest you use input/output streams, which are the intended Java mechanism for data transfer. Using stream you gives you the opportunity to control the flow of the data, to manipulate it before sending/after receiving, etc. – ethanfar Jun 25 '14 at 06:42
  • @eitanfar I am aware that the need is there (I was transferring that amount in 98 between various stages of a meteorological model that were in different countries - fun times) - It's just that doing it as a function call is... suboptimal, mainly from a memory usage perspective. – Tassos Bassoukos Jun 25 '14 at 06:48
  • consider for Java serialization as well!! – AJJ Jun 25 '14 at 06:59
  • @TassosBassoukos Does it need to be done in the first place? I mean that unless necessary, I would avoid passing around 1/2 GB worth of data. – Tarik Jun 25 '14 at 08:43
1

If you are reading the data from a file or a stream, then you can map the file into memory. So it won't read the entire file. Look in here

Sameera Kumarasingha
  • 2,908
  • 3
  • 25
  • 41
1

Since you have a large amount of data in tabular format, have you considered using Java DB (database)? Granted this is depending on what kind of processing you're going to do, how long you have to develop and how well you already know databases/SQL, but it sounds like you're going to read the data in row by row and databases are a good way to do this - especially with large amounts of data.

There is information about the JDBC API here on the Java Trail, with steps on how to use it: http://docs.oracle.com/javase/tutorial/jdbc/overview/index.html

From the Java Trail:

The JDBC API is a Java API that can access any kind of tabular data, especially data stored in a Relational Database.

Some things to keep in mind:

  • You have to know/learn SQL or other querying language.
  • You'll have to design the structure of the database and build it, although probably you can use a similar structure to what you were planning in your XML file.
  • KEYS! Keys are unique identifiers for each row in your database, like an ID number. I highly recommend you add a separate field/column to use as a key, especially if you're new to databases. They increase the memory overhead of your database a small amount but in return you don't have to worry about identifying unique rows and can quickly go back to a row you've already searched.
  • You can pick and choose what data to bring in - don't bring in more than you need.
Alium Britt
  • 1,246
  • 4
  • 13
  • 25
  • I highly support this idea, although I interpret from the question description that the method to do the processing already exists and is thus not designed to work on a database. So that would require redesigning the existing code too. – Gimby Jun 25 '14 at 13:53
0

If you are thinking about processing of the Data by a Java function/method, Consider chunks of Data to be processed at once. Again the size of Chunk you can decide based on some computations like start with 10 KB and see the performance and calculate. it depends on execution environment. There are several ways to get the chunks of data from file/stream/Database (even if it is remote server).you need to post more details about your problem to get the better suggestions.

WApp
  • 51
  • 1
  • 7