1

I'm trying to find a more or less foolproof way to solve the following problem, and would be grateful if any of you can suggest a better way of doing the following.

My application is handling text that comes from a file and can be of unlimited length (in my current example I have 8,000 words in a text file). For a new project the text (String) is loaded, tokenized (words), the tokens written to a graph model and displayed in a JTable (where they can be tagged). The graph can be saved to XML (and must be, should the project be made loadable).

The table consists of 3 rows (token index, token text, tag).

The problem is, the process of writing each token to the graph model takes a long time (~7 min. for 8,000 wds.). Thus I need to display the JTable before this writing process is completed and make it available for tagging. Not a big problem: Since the actual tokenization (clean & split String) is fast, I can display the table based on the String rather than filling it from the graph model (for tokens and tags).

The problem is rather, that a) the writing-to-graph process must be finished before the model can be saved, and b) that I need to write tags to the model for tokens that might not exist yet (when the user is quicker than the application/the writing process for tokens, e.g. when s/he chooses a tag for the last of a large no. of words while the writing process has only just started). So I want to be able to check the table for newly set tags as soon as possible after the tagging action, check the graph whether the token for the tag exists yet, write the tag if it does, and re-check later if it doesn't.

Here's a prose outline of how I thought it could be done, and I'd be grateful if you had a look at it and let me know whether you see opportunity for optimization and/or bland errors. Would save me a lot of time and re-factoring later.

Thanks a lot!

Preparation

  1. Load text from file
  2. Clean and split text into tokens
  3. Display table based on cleaned and split text
  4. Start process of writing tokens to graph (Thread) + update int (e.g., int lastWritten) with index of last token written

Tagging action by user

  1. Check if token to be tagged exists in graph (via lastWritten)
    • Yes: Write tag to graph, display tag in table
    • No: Save index of token in a list (e.g., taggedTokenNotWritten), display tag in table
      • Start new Thread that checks whether tokens in taggedTokenNotWritten have been written yet. If so - write tag to token + delete from list; if not - keep in list. Run Thread periodically (e.g., whenever a tag is set*?*).

Save action by user

  1. Check whether tokenization process is finished + check whether taggedTokenNotWritten is empty.
    • If yes - save; if no - display message & save only after the above applies.
s.d
  • 4,017
  • 5
  • 35
  • 65
  • 2
    having a hard time to understand what exactly you are wanting to do ;-) You really have three _rows_ (vs. three columns)? So just a quick comment on the save: I would disable the action until the tokenization process is complete (for better user experience you can set a tooltip on it to explain why it is disabled) – kleopatra Nov 07 '11 at 16:14
  • @kleopatra: Yes, three rows only :). Row one: indeces, row two: word token, row three: tags (if exist). That's a good tip, disabling the save action until the tokenization is finished. Should have thought of that meself, really :). Thanks! – s.d Nov 08 '11 at 15:22

2 Answers2

2

not answer to your question

everything talking about using Embedded database, there are some issues

1) assumption that File I/O must be redirected to the Background Task

  • Runnable#Thread

  • SwingWorker

2) you have to implement Paginations

  • for SQL engine (better and more confortable)

  • for JTable

3) loading data from File I/O could be paused (Thread#sleep(10-25)), to avoids higher CPU performance

4) for that is best of choises DefaultTableModel

5) problem will be if you need to display data from end of File, then you have to iplements two DefaultTableModel and two separate Background Task,

  • 1st. only loading required data from end of file and immediatelly to display these data to the JTable,
  • 2nd. for load data to the Embedded database

6) that very strange to take the time to load data ~7 min. for 8,000 wds., there are (must be) another problems (not to the discusion my view)

mKorbel
  • 109,525
  • 20
  • 134
  • 319
  • This indeed doesn't answer my question, and it seems to be your habit to start your answers with the words "not answer to your question". Unfortunately, your "answers" also tend to be unlegible. And to comment, 1) I'm not talking about `File I/O` but rather a graph model in the memory which will eventually be persisted to an XML file. 2) I don't "have to" implement anything, it's a purpose-driven decision how to display my table and not discussed here. 3) Loading data from a file is not an issue here. – s.d Nov 07 '11 at 12:57
  • cont...: 4) I know you've commented on a question of mine concerning `TableModel`s, but I haven't said anything about it here, and will have my reasons for using whatever `TableModel` I use. 5) I don't want to display anything from the end of a file but rather give the user the choice to change the value of a cell in a table that's already been rendered. No problems with displaying the table in general. – s.d Nov 07 '11 at 13:02
  • still cont...: 6) I've timed the routines needed to write a token into the graph model I use (create token, add token to node, create relation, set token to relation, set token start and end index, add relation to graph), and it's not fast for 8k tokens (I use a `for` loop over no. of tokens). If you have ideas of how to speed this up, feel free to coment :). But you're right, this is not part of the discussion. Sorry, -1 because in my opinion you're off the track with your answer. Of course that's subject to change if you can clarify :). Thanks! – s.d Nov 07 '11 at 13:05
2

You might be able to leverage this example that uses a SwingWorker to asynchronously process a BlockingQueue of pending entries in its background thread.

Community
  • 1
  • 1
trashgod
  • 203,806
  • 29
  • 246
  • 1,045
  • Thanks! Sounds very interesting. I'll try it. Seems to create code that's a lot cleaner than my idea. You're quickly becoming one of my favourite users here :). – s.d Nov 08 '11 at 15:26
  • I'm unsure where this would best be put to use. I guess tagging would be the best place? So tags (and the respective token index) would be pushed to the `BlockingQueue`, and the method writing tokens to the graph model would check whether a tag for the token that is to be written exists in the queue, and if so, takes and writes it to the graph? Otherwise the tags in the queue could only be written after all tokens have been written, which would block saving even longer. Or am I missing something (not unlikely ;-)). Thanks! – s.d Nov 11 '11 at 13:57
  • 1
    I had envisioned enabling the `Save` button when the queue was empty and disabling it otherwise. Naturally, whatever interim progress you can adduce would be user-friendly. – trashgod Nov 11 '11 at 14:17
  • Ah, that makes a lot of sense as well :). Thanks! – s.d Nov 11 '11 at 14:40