0

I am parsing a file with more than 4M lines in it. It is of the form a^b^c^d^...^.... Now i want all the unique points(only the 1st two entries should be unique) from the file. So what I do is,

String str;
Set<String> lines = new LinkedHashSet<String>();
Set<String> set = Collections.synchronizedSet(lines);
String str1[] = str.split("\\^");
set.add(str1[0]+"^"+str1[1]);

So this gives me the unique 1st and 2nd unique points from the file. However, I also want the 3rd point(timestamp) i.e str1[2] associated with the above points. The new file should be of the form.

  str1[0]^str1[1]^str1[2] 

How do I go about doing this?

RFT
  • 1,041
  • 4
  • 13
  • 25

2 Answers2

2

There are a few solutions that come to mind.

  1. Make a class for the 3 entries. Override the equals method and only check on the first 2 entries there, so 2 objects are equal if the first 2 entries are equal. Now add all the items to the set. So what you 'll get in your set is a list with unique first and second points and the first occaurance of your timestamp.

  2. Another solution is to keep two lists, one with your 2 points + time stamp, one with only your 2 points. The you can do set.contains(...) to check if you already saw the point and if you didn't add to the list with 2 points + timestamp.

Nactive
  • 540
  • 1
  • 7
  • 17
  • Make sure you have proper implementation of both equals and hashCode, see for example http://stackoverflow.com/questions/27581/overriding-equals-and-hashcode-in-java – Kristian Mar 06 '12 at 20:27
  • @Nactive I thought about the 2nd solution and found it a bit clumsy and thought if there was a smarter way. – RFT Mar 06 '12 at 20:29
  • Ofc the first implementation is better then the second one. But the second one is just easier and since you already use set.add(str1[0]+"^"+str1[1]) to add unique 'points' I thought you might be interested in a fast way instead of defining a new class and stuff like that. – Nactive Mar 06 '12 at 20:37
1

Create a class containing the information you need which you will store in the set, but only care about the first two in equals/hashCode. Then you can do:

Set<Point> set = new HashSet<Point>();
String str1[] = str.split("\\^");
set.add(new Point(str1[0], str1[1], str1[2]));

Using:

public class Point {

    String str1;
    String str2;
    String str3;

    public Point(String str1, String str2, String str3) {
        this.str1 = str1;
        this.str2 = str2;
        this.str3 = str3;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((str1 == null) ? 0 : str1.hashCode());
        result = prime * result + ((str2 == null) ? 0 : str2.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Point other = (Point) obj;
        if (str1 == null) {
            if (other.str1 != null)
                return false;
        } else if (!str1.equals(other.str1))
            return false;
        if (str2 == null) {
            if (other.str2 != null)
                return false;
        } else if (!str2.equals(other.str2))
            return false;
        return true;
    }
}
Kristian
  • 6,443
  • 6
  • 27
  • 29