I'm reading all the files of a directory and trying to save all the words, from all the files associated within that directory to a hash map, these words will be stored under a key which is the name of the directory.
for instance, a directory called atheism
contains one file called a0.txt
which contains the word Gott
, another file in the same directory called a1.txt
contains the word ist
, and a third file called a2.txt
contains the word tot
. I want to save all these words under the hash map key atheism
.
Later I want to generalize this to be able to accomidate big multi-line documents behind the key value of a particular directory, below I've posted the code which I'm working with right now to populate the hash map.
I had a look at this trying to adapt it but finally I didn't find it aplicable to my situation.
What I want to do is access the array associated with a specific key from within the hashmap, and just add the new words onto the end of it. How to do that?
I need these words because this is part of a program to implement the perceptron algorithm, I'm saving the words as part of the process of generating a bag-of-words model feature vector.
public static void iterateDirectory( File directory,
boolean globo_dict_fixed,
Map<String, ArrayList<String> > fileDict,
Set<String> GLOBO_DICT) throws IOException
{
for (File file : directory.listFiles())
{
if (file.isDirectory())
{
iterateDirectory(directory, globo_dict_fixed, fileDict, GLOBO_DICT );
}
else
{
String line;
BufferedReader br = new BufferedReader(new FileReader( file ));
ArrayList<String> document_words_on_line = new ArrayList<String>();
while((line = br.readLine()) != null)
{
String[] words = line.split(" ");//those are your words
if(globo_dict_fixed == false)
{
Data_GloboPop.populate_globo_dict( words, GLOBO_DICT );
}
else
{
String word;
for (int i = 0; i < words.length; i++)
{
word = words[i];
document_words_on_line.add(word);
}
}
}
String key_file_loke = file.getPath()
.toString()
.replaceAll("/[^/]*$", "")
.replaceAll("/home/matthias/Workbench/SUTD/ISTD_50.570/assignments/practice_data/data/train/", "")
.replaceAll("/home/matthias/Workbench/SUTD/ISTD_50.570/assignments/practice_data/data/test/", "");
//this should be here, meaning that the line is null and the file is over
//put all documents from the same directory under the same key
fileDict.put( key_file_loke , document_words_on_line );
}
}
}