0

I am trying to create a TreeMap<String,List<String,Integer>>. The conditions are

  1. If a word is not existing: insert the word into the treemap and associate the word with an ArrayList(docId, Count).
  2. If the word is present in the TreeMap, then check if the current DocID matches within the ArrayList and then increase the count.

Below is the code I am using.

public class StemTreeMap
{
    private static final String r1 = "\\$DOC";
    private static final String r2 = "\\$TITLE";
    private static final String r3 = "\\$TEXT";
    private static Pattern p1,p2,p3;
    private static Matcher m1,m2,m3;

    public static void main(String[] args)
    {
        BufferedReader rd,rd1;
        String docid = null;
        String id;
        int tf = 0;
        //CountPerDocument cp = new CountPerDocument(docid, count);
        List<CountPerDocument> ls = new ArrayList<>();
        Map<String,List<CountPerDocument>> mp = new TreeMap<>();
        
        try
        {
            rd = new BufferedReader(new FileReader(args[0]));
            rd1= new BufferedReader(new FileReader(args[0]));
            int docCount = 0;
            String line = rd.readLine();
            p1 = Pattern.compile(r1);
            p2 = Pattern.compile(r2);
            p3 = Pattern.compile(r3);
            while(line != null)
            {
                m1 = p1.matcher(line);
                m2 = p2.matcher(line);
                m3 = p3.matcher(line);
                if(m1.find())
                {
                    docid = line.substring(5, line.length());
                    docCount++;
                    //System.out.println("The Document ID is :");
                    //System.out.println(docid);
                    line = rd.readLine();
                }
                else if(m2.find()||m3.find())
                {
                    line = rd.readLine();
                }
                else
                {
                    if(!(mp.containsKey(line))) // if the stem is not on the TreeMap
                    {
                        //System.out.println("The stem is not present in the tree");
                        //System.out.println("The stem is not present in the tree: " + line + "   The Document is :" + docid);
                        
                        tf = 1;
                        ls.add(new CountPerDocument(docid,tf));
                        mp.put(line, ls);   
                        System.out.println("Inserted string is: "+ mp.get(line));
                        line = rd.readLine();
                    }
                    else
                    {
                        if(ls.indexOf(docid) > 0) //if its last entry matches the current document number
                        {
                            //System.out.println("The Stem is present for the same docid so incrementing docid: " +line + ":"+ docid);
                            tf = tf+1;
                            ls.add(new CountPerDocument(docid,tf));
                            line = rd.readLine();
                        }
                        else
                        {
                            //System.out.println("Stem is present but not the same docid so inserting new docid: "+line + ":"+ docid);
                            tf = 1;
                            ls.add(new CountPerDocument(docid,tf)); //set did to the current document number and tf to 1
                            line = rd.readLine();
                        }
                    }
                }
            }
            rd.close();
            System.out.println("The Number of Documents in the file is:"+ docCount);
            
            //Write to an output file
            String l = rd1.readLine();
            File f = new File("dictionary.txt");
            if (f.createNewFile())
            {
                System.out.println("File created: " + f.getName());
            }
            else 
            {
                System.out.println("File already exists.");
                Path path = Paths.get("dictionary.txt");
                Files.deleteIfExists(path);
                System.out.println("Deleted Existing File:: Creating New File");
                f.createNewFile();    
            }
            FileWriter fw = new FileWriter("dictionary.txt");
            fw.write("The Total Number of Stems: " + mp.size() +"\n");
            /*Set<Map.Entry<String,List<CountPerDocument>>> entries = mp.entrySet();
            
            for(Map.Entry<String,List<CountPerDocument>> entry : entries)
            {
                fw.write(entry.getKey() + entry.getValue());
            }   */
            
            Iterator<Map.Entry<String, List<CountPerDocument>>> iterator = mp.entrySet().iterator();
             
            Map.Entry<String, List<CountPerDocument>> entry = null;
             
            while(iterator.hasNext())
            {
                entry = iterator.next();
                fw.write(entry.getKey() + "=>" + entry.getValue() + "\n" );
            }
            
            //System.out.println(mp.get("todai"));
            fw.close();
            
        }catch(IOException e)
        {
            e.printStackTrace();
        }
    }
}

For creating the ArrayList I am using the class

public class CountPerDocument
{
    private final String documentId;
    private final int count;
    
    CountPerDocument(String documentId, int count)
    {
        this.documentId = documentId;
        this.count = count;
    }

    public String getDocumentId()
    {
        return this.documentId;
    }

    public int getCount()
    {
        return this.count;
    }
    
    @Override
    public String toString()
    {
        return this.documentId + "-" + this.count;
    }
}

When I tried to print what I was inserting into the map by printing mp.get(line), the output I get is as below:

Stem is: attempt
DocId is: LA010190-0002TF is : 1
Inserted string is: [LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0001-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1, LA010190-0002-1]

I'm not sure why so many are being inserted. Am I printing the output wrong, or is there something wrong with the method that I chose?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
gautam raj
  • 11
  • 3
  • Does this answer your question? [Why does my ArrayList contain N copies of the last item added to the list?](https://stackoverflow.com/questions/19843506/why-does-my-arraylist-contain-n-copies-of-the-last-item-added-to-the-list) – Tom Dec 22 '20 at 16:32
  • See the paragraph "Adding the Same Object" in the accepted answer there – Tom Dec 22 '20 at 16:33
  • I am already using new CountPerDocument() everything i am adding. ls.add(new CountPerDocument(docid,tf)); – gautam raj Dec 22 '20 at 16:42
  • You're wondering why so many CountPerDocument instances are in the list, so the problem is obviously the list you put into your map and not CountPerDocument. – Tom Dec 22 '20 at 16:53
  • Looks Like It. I am confused what i am doing wrong here. Do i need to add an extra column in the list say list to have correct values printed? – gautam raj Dec 22 '20 at 17:00
  • 2
    You use the same list for all entries in the map. – Mark Rotteveel Dec 22 '20 at 17:02
  • Do I use a different list for each entry? If so how do I do something like that!! – gautam raj Dec 22 '20 at 17:05

1 Answers1

0

Primitive versus object

The Java Collections hold objects (object references, technically speaking), not primitives. So you cannot use int when defining a List. Use Integer class, the OOP equivelant of int primitive.

List is of one type

No such thing as List < String , Integer >. A list is a single list, holding a series of elements that are all of one type. You can have either a List < String > or List < Integer > but not the combination.

Map of maps

Apparently you want to do a word count across multiple documents yet track the count of each word’s usage by document. You want to associate each word with a collection that associates each document to a count of that word within that document.

The collection for associating objects is a Map. So you need a map that maps each word to another map, a mapping of document identifier to count. That is, a map whose keys are a string and whose values are a map. Each word owns a map.

Map< String , Map< String , Integer > >

…where the first String refers to the word being counted, and the second String refers to the document identifier.

You logic should be something like this:

As you encounter each word of each document, find the word as a key in the outer map. If not found, put the key and a new empty inner map into the outer map. At this point you have an inner map in hand, either a pre-existing inner map or a new empty inner map.

In that inner map, search for the document identifier. If not found, put the doc id along with a new Integer set to zero. So now you have in hand either the pre-existing Integer or a new Integer. Add one to that Integer to get a new Integer. Put that doc id with new integer back into the inner map.

Alternatively, you could use AtomicInteger instead of Integer. Then you could call it’s incrementing method rather than replacing one immutable Integer with another immutable Integer.

As you must be a student doing homework, I will leave the rest for you to work out.

Tip: Notice how writing your logic out in plain prose provides clarification as well as an outline to follow when you write the code.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • You mean something like this? mp.put(line,(new TreeMap())); – gautam raj Dec 23 '20 at 06:47
  • @gautamraj Yes. Or `mp.put(line,(new TreeMap()));` Though I am not sure I would use `TreeMap` in particular unless keeping the document identifiers sorted matters to you. And, shouldn't that be `word` rather than `line`? – Basil Bourque Dec 23 '20 at 07:12
  • I am using Stemming to split lines to words!! In this context line = word. How do I insert the docid and count into this empty tree ??? – gautam raj Dec 23 '20 at 07:20
  • Regarding "How do I insert the docid and count into this empty tree ??? "… The very same way you put a key and value into the outer map. (a) Reread my walk through the logic. Hint: "in hand" means an object reference. (b) What is returned to you when calling [`Map::get`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Map.html#get(java.lang.Object))? – Basil Bourque Dec 23 '20 at 07:25
  • I did read your logic, its very close to what i want. For the inner Map, there is no initialization created like for outer TreeMap "mp" is an instance created so mp.put() is used. But for inner treemap because its an empty map with out any instance i am not sure how to use put() for it. – gautam raj Dec 23 '20 at 07:32
  • You created a instance of the inner map in your first comment above, an instance of an empty map. What does `new …` return? Next answer my (b) question to work it all out. You have all the parts you need, just think it through. – Basil Bourque Dec 23 '20 at 07:35
  • When i created an empty Map and printed to see whats being passed the output i got is the value is: new :{} , so an empty value is passed. – gautam raj Dec 23 '20 at 07:38
  • new returns an Object. Correct me if i am wrong. The mp.get returns all values stored in the TreeMap. Am i supposed to use getValue() to insert the docid and count into the empty TreeMap? – gautam raj Dec 23 '20 at 08:04
  • Figured it out. Thank you for your suggestions. – gautam raj Dec 23 '20 at 08:50