Java text extraction and data structure design

Question

I have a huge set of data of tables in Open Office 3.0 document format.

   Table 1:
    (x range)|(x1,y1) |(x2,y2)|(x3,x3)|(x4,y4) 
    (-20,90) |(-20,0) |(-5,1) |(5,1)  |(10,0)
    ...

Like wise i have n number of tables.All of these tables are fuzzy set membership functions.In simple terms they are computational model's according to which i have to process the input data.There are many number of such tables with differing row size and column size 3/4 .These data's are not going to change once loaded.

Example: When i get a value of x in between the range -20 to 90.I will apply the first rule(given above).Suppose that it is -1(which is in between value of -20 and -5).Then I have to find a corresponding value between 0 and 1.

My First question is how to extract all the data from the tables in document format so that i can use in my java program.I know a bit of python and I know python can be useful in such cases.But then how to use it in my Java program.

Secondly what would be the best data structure i should use in such a senario.

Note: I'm not using any database.So i would prefer to keep the tables either in xml or some other format so that i can load it easily to the program.I also thinking of making a suitable data structure and then serializing them so that I can load them whenever required instead of parsing a file and recreating the data structure.Please post your comments.

@All - please read the comments to other answers, the question is not about parsing the formatted table in text format. — Andreas Dolk, Aug 19 '10 at 06:04
@Andreas:I have mentioned it above the table.May be you missed it.I'll highlight it for you. — Emil, Aug 19 '10 at 06:07
*document format* doesn't indicate that you have an OpenOffice Document (which version?) that holds the data. So the task is *not* parsing the example text but extracting data from an OpenOffice document. That was my concern. — Andreas Dolk, Aug 19 '10 at 06:17
@Andreas:Sorry for the misunderstanding I caused.I have edited the question. — Emil, Aug 19 '10 at 06:20

score 1 · Accepted Answer · answered Aug 19 '10 at 05:57

1

In order to parse an OpenOffice Document in Java (to extract data), you can use a dedicated API such as ODFDOM. I think this solution is very complicated for what you need. A easier solution would be to extract manually the OpenOffice table, to put it in a format more friendly to parse in Java:

CSV
DataBase (MySQL, etc.)

answered Aug 19 '10 at 05:57

Benoit Courtine

7,014
31
42

+1 for the practical advise: copy and paste the table to a spreadsheet and export the table data from there to csv. Makes life much easier. – Andreas Dolk Aug 19 '10 at 07:14
Thanks.It helped and for the data structure I decided to use navigable map. check http://stackoverflow.com/questions/3519901/get-values-for-keys-within-a-range-in-java – Emil Aug 19 '10 at 10:14

Java text extraction and data structure design

1 Answers1