3

I have two Strings that look like this:

String str1 = "[0.7419,0.7710,0.2487]";
String str2 = "[\"0.7710\",\"0.7419\",\"0.2487\"]";

and I want to compare them and be equal despite the order difference...

Which is the fastest and simplest way to do that?

Should I split each one into Arrays and compare the two Arrays? Or not? I guess I have to remove the "[","]",""" characters to make it clearer so I did. And I also replaced the "," with " " but I don't know if this helps...

Thanks in advance :)

Edit: My Strings will not always be a set of doubles or floats. They may also be actual words or a set of characters.

T. Kofidis
  • 57
  • 6
  • 3
    Your strings are representations of `Set` (at least you said so). So parse them as such, then compare. – M. Prokhorov Sep 04 '17 at 12:29
  • Will you have only numbers inside your strings ? Or it can be any characters ? – Schidu Luca Sep 04 '17 at 12:34
  • I would parse those numbers into Double and put them into List. Then sort and compare one by one. – FuriousSpider Sep 04 '17 at 12:36
  • this was just an example I posted. They won't always be doubles or floats, they may be a set of characters as well. But thanks! – T. Kofidis Sep 04 '17 at 12:36
  • I'm sorry I will edit the post to make my question clearer – T. Kofidis Sep 04 '17 at 12:36
  • @T.Kofidis, Ok, so not `Set`, just `Set`. It changes how you parse them, but doesn't really change how you compare in general. – M. Prokhorov Sep 04 '17 at 12:39
  • 1
    a slower but nicer solution would be to read them as JSONArrays. Bcuz that's what they look like. maybe it will be easy to compare them then – Jack Flamp Sep 04 '17 at 12:42
  • Are there ever duplicates, or are these guaranteed to be sets? Because comparing them while tracking duplicates without regards to order becomes a different beast of a task when all of those factors are present. – Rogue Sep 04 '17 at 13:30

5 Answers5

2

Because you have a mixed result type, you need to first handle it as a mixed input

Here's how I would replace it, particularly for longer strings.

private Stream<String> parseStream(String in) {
    //we'll skip regex for now and can simply hard-fail bad input later
    //you can also do some sanity checks outside this method
    return Arrays.stream(in.substring(1, in.length() - 1).split(",")) //remove braces
        .map(s -> !s.startsWith("\"") ? s : s.substring(1, s.length() - 1)); //remove quotes
}

Following up, we now have a stream of strings, which need to be parsed into either a primitive or a string (since I'm assuming we don't have some weird form of object serialization):

private Object parse(String in) {
    //attempt to parse as number first. Any number can be parsed as a double/long
    try {
        return in.contains(".") ? Double.parseDouble(in) : Long.parseLong(in);
    } catch (NumberFormatException ex) {
        //it's not a number, so it's either a boolean or unparseable
        Boolean b = Boolean.parseBoolean(in); //if not a boolean, #parseBoolean is false
        b = in.toLowerCase().equals("false") && !b ? b : null; //so we map non-false to null
        return b != null ? b : in; //return either the non-null boolean or the string
    }
}

Using this, we can then convert our mixed stream to a mixed collection:

Set<Object> objs = this.parseStream(str1).map(this::parse).collect(Collectors.toSet());
Set<Object> comp = this.parseStream(str2).map(this::parse).collect(Collectors.toSet());
//we're using sets, keep in mind the nature of different collections and how they compare their elements here
if (objs.equals(comp)) {
    //we have a matching set
}

Lastly, an example of some sanity checks would be ensuring things like the appropriate braces are on the input string, etc. Despite what others said I learned the set syntax as {a, b, ...c}, and series/list syntax as [a, b, ...c], both of which have different comparisons here.

Rogue
  • 11,105
  • 5
  • 45
  • 71
1

This could be done by below a method of making a set of string which is implemented using TreeSet so sorting can be handles in built. it just a simple convert both in string of set and compare using equals method. try below code:

String str1 = "[0.7419,0.7710,0.2487]";
        String str2 = "[\"0.7710\",\"0.7419\",\"0.2487\"]";
        String jsonArray = new JSONArray(str2).toString();
        Set<String> set1 = new TreeSet<String>(Arrays.asList(str1.replace("[", "").replace("]", "").split(",")));
        Set<String> set2 = new TreeSet<String>(Arrays.asList(jsonArray.replace("[", "").replace("]", "").replace("\"", "").split(",")));
        if(set1.equals(set2)){
             System.out.println(" str1 and str2 are equal");
       }

Here in above code i took help of jsonArray, to remove "\" character.

Note:

But this will not work if duplicate element in one string and other string are different in number because set does not keep duplicates.

Try using list that keeps duplicate element and solve your problem.

String str1 = "[0.7419,0.7710,0.2487]";
            String str2 = "[\"0.7710\",\"0.7419\",\"0.2487\"]";
            String jsonArray = new JSONArray(str2).toString();
            List<String> list1=new ArrayList<String>(Arrays.asList(str1.replace("[", "").replace("]", "").split(",")));
            List<String> list2=new ArrayList<String>(Arrays.asList(jsonArray.replace("[", "").replace("]", "").replace("\"", "").split(",")));
            Collections.sort(list1);
            Collections.sort(list2);
            if(list1.equals(list2)){
                  System.out.println("str1 and str2 are equal");
            }
Raju Sharma
  • 2,496
  • 3
  • 23
  • 41
  • If `set2` is empty after removing `set1`, it doesn't mean they're equal. It could mean that `set2` is a subset of `set1`. Use `set1.equals(set2)` instead. – DodgyCodeException Sep 04 '17 at 13:36
  • 1
    how it could be ? , if set1's all elements are removed from set2 then , if it become empty then they are equal otherwise not. – Raju Sharma Sep 04 '17 at 13:38
  • 1
    If `set2` is `{1, 2, 3}` and `set1` is `{1, 2, 3, 4}`, then removing `set1` from `set2` will give you an empty set. But they're still not equal – Rogue Sep 04 '17 at 13:41
0

This is pretty simple solution for you using HashSet.

Benefits of Set:-

  • It cannot contains duplicate.
  • Insertion/deletion of element is O(1).
  • Pretty much faster than Array. Here keeping the element Order is also not important so it's okay.

    String str1 = "[0.7419,0.7710,0.2487]";
    String str2 = "[\"0.7710\",\"0.7419\",\"0.2487\"]";
    
    Set<String> set1 = new HashSet<>();
    Set<String> set2 = new HashSet<>();
    
    String[] split1 = str1.replace("[", "").replace("]", "").split(",");
    String[] split2 = str2.replace("[", "").replace("]", "").replace("\"", "").split(",");
    set1.addAll(Arrays.asList(split1));
    set2.addAll(Arrays.asList(split2));
    
    System.out.println("set1: "+set1);
    System.out.println("set2: "+set2);
    
    boolean isEqual = false;
    if(set1.size() == set2.size()){
        set1.removeAll(set2);
        if(set1.size() ==0){
            isEqual = true;
        }
    }
    
    System.out.println("str1 and str2 "+( isEqual ? "Equal" : "Not Equal") );
    

output:

set1: [0.7710, 0.2487, 0.7419]
set2: [0.7710, 0.2487, 0.7419]
str1 and str2 Equal
nagendra547
  • 5,672
  • 3
  • 29
  • 43
  • If `set2` contains all elements of `set1` plus a few more, you won't be able to tell the difference from both sets being the same. – DodgyCodeException Sep 04 '17 at 13:03
  • or just substring it and then split once. No iterations in that for longer strings. – Rogue Sep 04 '17 at 13:04
  • @DodgyCodeException I dont see any such requirement in question about keeping the duplicate also. – nagendra547 Sep 04 '17 at 13:07
  • @nagendra547 The requirement is not just to compare those specific example values; it's to be able to compare any values. Your solution fails with inputs `str1 = "[0.7419,0.7710,0.2487]"` and `str2 = "[\"0.7710\",\"0.7419\",\"0.2487\", \"0.1234\"]"`. – DodgyCodeException Sep 04 '17 at 13:15
  • You are getting into nitty bitty of solution. However, you can just check the size of set. if set size is same then do removeAll and then check final size. – nagendra547 Sep 04 '17 at 13:27
  • Or you could use `Set.equals()` ;-) – DodgyCodeException Sep 04 '17 at 13:32
  • equals method is inefficient of HashSet. Don't use it. :) Once you see the implementation in AbstractSet, hope you will get it. It's not advisable to use equals and hashcode method of java. – nagendra547 Sep 04 '17 at 13:35
  • Isn't `HashSet.removeAll()` just as (or probably more) inefficient? – DodgyCodeException Sep 04 '17 at 13:40
  • 2
    @nagendra547 that advice is not only wrong but dangerous. `#equals`/`#hashcode` are critical constructs of the language. `#equals` (for `HashSet`) will use all of those comparisons (e.g. size) followed by `#containsAll`, which will use the `O(1)` contains method from `HashSet` over the elements of the _passed_ collection (not the parent). Using `#removeAll` will iterate the input collection and call the (also) `O(1)` `#remove` method on `HashSet`. So as far as your argument of "inefficient", it's hot air. You're just reinventing the wheel. – Rogue Sep 04 '17 at 13:49
  • @Rogue read here https://stackoverflow.com/questions/2265503/why-do-i-need-to-override-the-equals-and-hashcode-methods-in-java – nagendra547 Sep 04 '17 at 13:54
  • 1
    @nagendra547 that SO question has nothing to do with your comment at hand, which is using `#equals` on a collection itself (not implementing hashability/equality into an object). – Rogue Sep 04 '17 at 13:56
  • Did you check the implementation of equals method of AbstractSet? Once you see the implementation in java and see my code then let me know which one will be faster. – nagendra547 Sep 04 '17 at 13:58
  • 1
    Yes I did, as I literally described how it works in my comment above, what's your justification? _What_ do you think is so "inefficient"/slow? Because from what I'm seeing you're saving no efficiency at the cost of unnecessarily modifying a collection + reinventing the wheel + more code. I would call this premature optimization but I don't think it even optimizes in this case. – Rogue Sep 04 '17 at 14:00
  • I'd also like to add that insertion is definitely not always O(1) as the underlying hashmap for the hashset will have to expand its array when it resizes (so number of elements [times the load factor] larger than the default container size, 16 elements, followed by 32, 64, etc). I don't believe it resizes on removal, but it definitely won't resize for #contains. – Rogue Sep 04 '17 at 14:31
0

Like this:

    String[] a1 = str1.replaceAll("^\\[|\\]$", "").split(",", -1);
    String[] a2 = str2.replaceAll("^\\[|\\]$", "").split(",", -1);
    for (int i = 0; i < a2.length; i++)
        a2[i] = a2[i].replaceAll("^\\\"|\\\"$", "");
    Arrays.sort(a1);
    Arrays.sort(a2);
    boolean stringsAreEqual = Arrays.equals(a1, a2);

Or you can use a fully functional approach (which may be slightly less efficient):

    boolean stringsAreEqual = Arrays.equals(
            Arrays.stream(str1.replaceAll("^\\[|\\]$", "").split(",", -1))
                    .sorted()
                    .toArray(),
            Arrays.stream(str2.replaceAll("^\\[|\\]$", "").split(",", -1))
                    .map(s -> s.replaceAll("^\\\"|\\\"$", ""))
                    .sorted()
                    .toArray()
    );

The advantage of using arrays over using sets (as proposed by others) is that arrays typically use less memory and they can hold duplicates. If your problem domain can include duplicate elements in each string, then sets can't be used.

DodgyCodeException
  • 5,963
  • 3
  • 21
  • 42
  • For a large number of elements, it's inefficient solution. As you are sorting the two arrays, when you just need to check the equality of two. – nagendra547 Sep 04 '17 at 13:45
  • 1
    BTW, if using a `java.util.HashSet` is an issue due to not capable of handling duplicates, it can be replaced with Google Guava HashMultiset: http://google.github.io/guava/releases/22.0/api/docs/com/google/common/collect/HashMultiset.html. – yegodm Sep 05 '17 at 13:13
0

Google GSON can handle this task quite neatly by reading values as a Set<String>:

    final String str1 = "[0.7419,0.7710,0.2487]";
    final String str2 = "[\"0.7710\",\"0.7419\",\"0.2487\"]";
    final String str3 = "[\"0.3310\",\"0.7419\",\"0.2487\"]";
    final Gson gson = new Gson();
    final Type setOfStrings = new TypeToken<Set<String>>() {}.getType();
    final Set<String> set1 = gson.fromJson(str1, setOfStrings);
    final Set<String> set2 = gson.fromJson(str2, setOfStrings);
    final Set<String> set3 = gson.fromJson(str3, setOfStrings);

    System.out.println("Set #1:" + set1);
    System.out.println("Set #2:" + set2);
    System.out.println("Set #3:" + set3);
    System.out.println("Set #1 is equivalent to Set #2: " + set1.equals(set2));
    System.out.println("Set #1 is equivalent to Set #3: " + set1.equals(set3));

The output is:

Set #1:[0.7419, 0.7710, 0.2487]
Set #2:[0.7710, 0.7419, 0.2487]
Set #3:[0.3310, 0.7419, 0.2487]
Set #1 is equivalent to Set #2: true
Set #1 is equivalent to Set #3: false
yegodm
  • 1,014
  • 7
  • 15