I'm trying to find a more or less foolproof way to solve the following problem, and would be grateful if any of you can suggest a better way of doing the following.
My application is handling text that comes from a file and can be of unlimited length (in my current example I have 8,000 words in a text file). For a new project the text (String
) is loaded, tokenized (words), the tokens written to a graph model and displayed in a JTable
(where they can be tagged). The graph can be saved to XML (and must be, should the project be made loadable).
The table consists of 3 rows (token index, token text, tag).
The problem is, the process of writing each token to the graph model takes a long time (~7 min. for 8,000 wds.). Thus I need to display the JTable
before this writing process is completed and make it available for tagging. Not a big problem: Since the actual tokenization (clean & split String
) is fast, I can display the table based on the String rather than filling it from the graph model (for tokens and tags).
The problem is rather, that a) the writing-to-graph process must be finished before the model can be saved, and b) that I need to write tags to the model for tokens that might not exist yet (when the user is quicker than the application/the writing process for tokens, e.g. when s/he chooses a tag for the last of a large no. of words while the writing process has only just started). So I want to be able to check the table for newly set tags as soon as possible after the tagging action, check the graph whether the token for the tag exists yet, write the tag if it does, and re-check later if it doesn't.
Here's a prose outline of how I thought it could be done, and I'd be grateful if you had a look at it and let me know whether you see opportunity for optimization and/or bland errors. Would save me a lot of time and re-factoring later.
Thanks a lot!
Preparation
- Load text from file
- Clean and split text into tokens
- Display table based on cleaned and split text
- Start process of writing tokens to graph (
Thread
) + updateint
(e.g.,int lastWritten
) with index of last token written
Tagging action by user
- Check if token to be tagged exists in graph (via
lastWritten
)- Yes: Write tag to graph, display tag in table
- No: Save index of token in a list (e.g.,
taggedTokenNotWritten
), display tag in table- Start new
Thread
that checks whether tokens intaggedTokenNotWritten
have been written yet. If so - write tag to token + delete from list; if not - keep in list. RunThread
periodically (e.g., whenever a tag is set*?*).
- Start new
Save action by user
- Check whether tokenization process is finished + check whether
taggedTokenNotWritten
is empty.- If yes - save; if no - display message & save only after the above applies.