-1

I'm reading all the files of a directory and trying to save all the words, from all the files associated within that directory to a hash map, these words will be stored under a key which is the name of the directory.

for instance, a directory called atheism contains one file called a0.txt which contains the word Gott, another file in the same directory called a1.txt contains the word ist, and a third file called a2.txt contains the word tot. I want to save all these words under the hash map key atheism.

Later I want to generalize this to be able to accomidate big multi-line documents behind the key value of a particular directory, below I've posted the code which I'm working with right now to populate the hash map.

I had a look at this trying to adapt it but finally I didn't find it aplicable to my situation.

What I want to do is access the array associated with a specific key from within the hashmap, and just add the new words onto the end of it. How to do that?

I need these words because this is part of a program to implement the perceptron algorithm, I'm saving the words as part of the process of generating a bag-of-words model feature vector.

public static void iterateDirectory( File directory, 
                                     boolean globo_dict_fixed, 
                                     Map<String, ArrayList<String> > fileDict,
                                     Set<String> GLOBO_DICT) throws IOException 
{
    for (File file : directory.listFiles()) 
    {
        if (file.isDirectory()) 
        {
            iterateDirectory(directory, globo_dict_fixed, fileDict, GLOBO_DICT );
        } 
        else 
        {   
            String line; 
            BufferedReader br = new BufferedReader(new FileReader( file ));


            ArrayList<String> document_words_on_line = new ArrayList<String>();

            while((line = br.readLine()) != null) 
            {
                String[] words = line.split(" ");//those are your words

                if(globo_dict_fixed == false)
                {
                    Data_GloboPop.populate_globo_dict( words, GLOBO_DICT );
                }
                else
                {
                    String word;

                    for (int i = 0; i < words.length; i++) 
                    {
                        word = words[i];

                        document_words_on_line.add(word);
                    }

                }

            }
            String key_file_loke = file.getPath()
                                       .toString()
                                       .replaceAll("/[^/]*$", "")
                                       .replaceAll("/home/matthias/Workbench/SUTD/ISTD_50.570/assignments/practice_data/data/train/", "")
                                       .replaceAll("/home/matthias/Workbench/SUTD/ISTD_50.570/assignments/practice_data/data/test/", "");
            //this should be here, meaning that the line is null and the file is over

            //put all documents from the same directory under the same key
            fileDict.put( key_file_loke , document_words_on_line );

        }
    }
}
Community
  • 1
  • 1

3 Answers3

0

Why can't you try this model

Map<String,Map<String,String>> directoryFiles = new HashMap<>();
Map<String,String> fileNameAndContents = new HashMap<>();
//Create all file contents map add it to main map
directoryFiles.put("directory", fileNameAndContents);


public static void main(String[] d) throws Exception {
    Map<String,Map<String,String>> directoryFiles = new HashMap<>();
    listfileContent("d:/f1",directoryFiles);
    System.out.println(directoryFiles.toString());
}

public static void listfileContent(String directoryName,Map<String,Map<String,String>> directoryFiles) {
    File directory = new File(directoryName);
    // get all the files from a directory
    File[] files = directory.listFiles();
    Map<String,String> fileNameAndContents = new HashMap<>();
    for (File file : files) {
        if (file.isFile()) {
            fileNameAndContents.put(file.getName(), "FileContent " );
            directoryFiles.put(directoryName, fileNameAndContents);

        } else if (file.isDirectory()) {
            listfileContent(file.getAbsolutePath(),directoryFiles);
        }
    }
}

I think this will help you. "FileContent " place you can write a function to get the data from a file.

Pasupathi Rajamanickam
  • 1,982
  • 1
  • 24
  • 48
0

Well you want to -- "access the array associated with a specific key from within the hashmap"

Sure, instead of initializing the ArrayList each time, you can always replace this line
ArrayList<String> document_words_on_line = new ArrayList<String>();
with this line
ArrayList<String> document_words_on_line = fileDict.get(key_file_loke);
And compute the key_file_loke before the assignment of document_words_on_line.

[Update] : If you get a null then initialize it, else use the same reference. In short : String key_file_loke = Insert your existing logic; ArrayList<String> document_words_on_line = fileDict.get(key_file_loke); if(document_words_on_line == null){ ArrayList<String> document_words_on_line = new ArrayList<String>(); }

Gyan
  • 1,176
  • 1
  • 12
  • 26
  • so where should i initialize that array list? i should make it a global variable or soemthing? –  Feb 20 '15 at 09:59
  • i tried this and it didn't work. I'm going to post what i've done in a new answer could you check it out? ok, i just placed it. –  Feb 20 '15 at 10:01
  • java doesn't like that. it says 'duplicate local variable' –  Feb 20 '15 at 10:05
  • Well, I was trying to give you an idea of what you could do. Not write the code for you. – Gyan Feb 20 '15 at 10:07
  • my bad i'm not trying to be a dick, i've been coding this for many hours now and i think my brain is sort of friend, please forgive me –  Feb 20 '15 at 10:09
  • you mean "sort of fried" .. Take a break :) – Gyan Feb 20 '15 at 10:11
0

First of all, you may want to replace this part of code

if (file.isDirectory()) 
{
    iterateDirectory(directory, globo_dict_fixed, fileDict, GLOBO_DICT );
}

with fhis

if (file.isDirectory()) 
{
    iterateDirectory(file, globo_dict_fixed, fileDict, GLOBO_DICT );
}

Next, you should put new key-value pair into hashmap as soon as you find a new directory, before you start looking for words in this directory. So you don't need to check and create new List and key-value pair for each file in the same directory (assuming you have a directory with lots of files).

For example

for (File file : directory.listFiles()) 
{
    if (file.isDirectory()) 
    {
        fileDict.put(getDirectoryName(file), new LinkedList<>());
        iterateDirectory(file, globo_dict_fixed, fileDict, GLOBO_DICT );
    } 
    else 
    {   
        final String directoryName = getDirectoryByFilePath(file); // you should extract this method from your code
        List<String> wordsList = fileDict.get(directoryName);
        if(wordsList == null) { // just in case
            wordsList = new LinkedList<>();
            fileDict.put(directoryName, wordsList);
        }

        String line; 
        BufferedReader br = new BufferedReader(new FileReader( file ));
        while((line = br.readLine()) != null) {
            String[] words = line.split(" ");//those are your words
            if(globo_dict_fixed == false)
            {
                Data_GloboPop.populate_globo_dict( words, GLOBO_DICT );
            }
            else
            {
                for (int i = 0; i < words.length; i++) 
                {
                        wordsList.add(words[i]);
                }
            }
        }
    }
}

And if it you do not care about accessing words by random index, I recommend to use LinkedList instead of ArrayList.

esin88
  • 3,091
  • 30
  • 35
  • 1
    i'm feeling so confused, I've been programming for hours. can you show me what your talking about please? –  Feb 20 '15 at 09:55
  • so that is the full method? –  Feb 20 '15 at 10:06
  • Also, you might find some problems with files with same name in different directories. – esin88 Feb 20 '15 at 10:13
  • it's saying getDirectoryName and getDirectoryByFilePath are undefined –  Feb 20 '15 at 10:14
  • Yes, I commented this in my answer: you should implement this methods. First one must extract directory name from a file which is directory (probably just file.getName(), but consider hierarchy). Second one shoul do the same, but for the file in this directory. So pogram puts all words from this file to the correct place in your hashmap. – esin88 Feb 20 '15 at 10:21