1

I am C programmer and now I moved to Java. I am trying to convert C program in Java program. C programs simply calculate term frequency and inverse document frequency (tf/idf).

I created one data class

public class Data {
    private String fileName,fileText;
    private int fileId;
    private float value;

    public void addData(String fileName, String fileText, float value){
        this.fileName = fileName;
        this.fileText = fileText;
        this.value = value;
    }

    public int getFileId(){
        return this.fileId;
    }


    public String getFileName(){
        return this.fileName;
    }

    public String getFileText(){
        return this.fileText;
    }

    public float getValue(){
        return this.value;
    }
}

This class is responsible to store file name, file text, and Value (tf value or idf value).

The following class is responsible to store data:

public class main {
    public static void main(String[] args) {

        HashMap<String, Data> map = new HashMap<String, Data>();
        Data dt = new Data();

        dt.addData("abc.txt", "some contents", 2);
        map.put("1",dt);
        dt.addData("w", "some more contents in second file", 3);
        map.put("2",dt);

        System.out.println(map);

    }

}

When I print map, it gives me some weird values. I think, I have to declare array of data class? I don't know how many files are there, therefore I can not put any static array number.

Also, how can I calculate TF and IDF based on this data structure?

In a C program, I simply read files, count the words divide by total number of words to get TF and a word divided by total occurrence of that word in all files to get IDF. I do not know how to do it using above data structure.

I get weird values. Maybe these are objects:

{2=test2.Data@19821f, 1=test2.Data@19821f}

Is there any way to get a specific value from Data class using getFileName etc. functions?

durron597
  • 31,968
  • 17
  • 99
  • 158
Tweet
  • 678
  • 2
  • 10
  • 26
  • 3
    Rather than saying the `println()` "gives me some weird values", perhaps you could show what output you *do* get. Most Java programmers can guess what you're seeing, but it's good to be sure. Your weird values might be perfectly normal to another reader. – Greg Hewgill Dec 23 '10 at 22:20
  • 1
    The core of your question appears to be "When I print map, it gives me some weird values". What were you expecting, vs what you got? – Mud Dec 23 '10 at 22:20
  • Your main class has a bug that's somewhat an aside from your question. You are attempting to insert the same Data instance into your map multiple times. You might be surprisd to know that the second call to addData will actually result in an overwrite of the values already inserted into the map. Instead, call new Data() prior to each addData statement. – csj Dec 23 '10 at 22:24
  • Here are the values {2=test2.Data@19821f, 1=test2.Data@19821f} and I think these are objects. I am sorry for calling it weird values. I want to print all the values that I added in Data class object. And any idea for second question? – Tweet Dec 23 '10 at 22:26
  • Beware of using the same object ( is like using the same struct ) because you're just dropping the previous values. – OscarRyz Dec 23 '10 at 22:36

4 Answers4

1

For question one, unless you override toString(), you are unlikely to get any meaningful output just by printing objects directly to stdout. The 'test2.Data@19821f' is what Object.toString() returns - class name followed by object hash. In this case, it quite helpfully shows that both your values are the same objects.

You can open/read files using java.io.File and java.io.FileInputStream. A map from strings to integers java.util.Map<String,Integer> will probably help with counting words in those files.

There doesn't seem to be much need for your data class for this simple application. You've already described the algorithm to follow, it's just a case of writing it in Java syntax.

Welcome to the wonderful world of type safety and not worrying about memory leaks.

OrangeDog
  • 36,653
  • 12
  • 122
  • 207
  • Could you please explain bit more, what do you mean by I can do it only with File reader? I just searched good to see any example, but I couldn't find any :( – Tweet Dec 23 '10 at 22:31
  • If you are trying to write an application to read files and count words, then you don't need this Data class: just read some files and count the words. – OrangeDog Dec 23 '10 at 22:32
  • To down-voters: please also explain why, or nobody is any the wiser. – OrangeDog Dec 23 '10 at 22:37
  • Thanks, but each word will have its own frequency, while in Map I can just store one frequency for whole string. In addition, I also need IDF then how can I get IDF with this data structure? – Tweet Dec 23 '10 at 22:55
  • My intent was that you would map each word to its frequency. Obviously you need to split strings up into words at some point. As for IDF, you've already told us how to calculate that. – OrangeDog Dec 23 '10 at 23:02
0

You're creating only one instance of Data. You probably want to do something more like:

    Data dt = new Data();
    dt.addData("abc.txt", "some contents", 2);
    map.put("1",dt);

    dt = new Data();
    dt.addData("w", "some more contents in second file", 3);
    map.put("2",dt);

Or better yet, change Data to take the properties in its constructor:

    map.put("1", new Data("abc.txt", "some contents", 2));
    map.put("2", new Data("w", "some more contents in second file", 3));
Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
0

It's not clear what your question is (see comments below your question), but there's a few things wrong with your code. addData is a misleading name for a method that replaces the data in the object. But the real problem is here:

  dt.addData("abc.txt", "some contents", 2);
  map.put("1",dt);
  dt.addData("w", "some more contents in second file", 3);
  map.put("2",dt);

This results in a map containing two entries, both of which refer to the same Data object, which will contain the values from the last call to addData. Change addData to be a constructor:

public Data(String fileName, String fileText, float value) {

Then change your map code to this:

map.put("1", new Data("abc.txt", "some contents", 2));
map.put("2", new Data("w", "some more contents in second file", 3));
Mud
  • 28,277
  • 11
  • 59
  • 92
0

Most likely you are expecting to see the Data string representation.

When you invoke println over any object ( including the map ) the system invokes Object.toString()

In the case of the map, the toString method returns the content of the map, in a format similar to this:

{ key = value, key2, value2 }

That is, print the key,value pairs it has.

Now, the key and value are objects too, so their own toString() method is invoked. For a string the value is it self. But, in the case of Data, since you haven't supplied your own implementation, you'll get default's which is object fully qualified name @ object.hashCode() So you are probably getting something like:

 { 1 = Data@0xa6f2be, 2 = Data@0xa6f2be }

To change this you have to override the toString() method:

 class Data { 
  ... etc. etc. 
     public String toString() { 
       // return something meaningful like:
       return String.format( "Data( fileName = %s, fileText = %s, etc ", this.fileName, this.fileText );
     }
  }

As for the second question, you'll do it basically the same way as you would in C. Perhaps you should create a calculate() method that opens the file, and start the counting. Probably this deserve its own question.

Community
  • 1
  • 1
OscarRyz
  • 196,001
  • 113
  • 385
  • 569
  • Thanks for your answer... Where should I call calculate method? In data class or in main class? I am sorry I belong to old structure based school. I can calculate words as I read file and calculate TF but what about IDF? How do I calculate IDF? Should I create another field in Data class to store IDF values or what? If you want I can post another question separately... – Tweet Dec 23 '10 at 22:44
  • Exactly in the same place as you would call in C `calucalte( thisStruct )` :) I would call it in main. As for IDF – OscarRyz Dec 23 '10 at 23:06
  • ahh thanks :) I thaught may be there is another OOP myth to declare such functions. Thanks for your help. Do you have any idea how to use array in my Data class? I have to store freq. for each word and now I am using one float that can only store one word's freq. Any idea how to do it? I can declare array of float in Data class but I don't know how to use it :p. In addition, how can I get specific value from Data class object? For example calling getFileName to get only 1 class name??? – Tweet Dec 23 '10 at 23:20