0

I have a string like:

Hello how how how are are you you?

I love cookies cookies, apples and pancakes pancakes.

I wish for an output:

Hello how are you?

I love cookies, apples and pancakes.

Till now I have coded:

String[] s = input.split(" ");
String prev = s[0];
String ans = prev + " ";

for (int i = 1; i < s.length; i++) {

    if (!prev.equals(s[i])) {
        prev = s[i];
        ans += prev + " ";
    }
}

System.out.println(ans);

I get outputs as:

Hello how are you you?

I love cookies cookies, apples and pancakes pancakes.

I need some help with the logic for , . ! ? ...

Mike
  • 14,010
  • 29
  • 101
  • 161
  • @JBNizet I find your comment rude, the author of the post said he needs help with the logic, meaning he already knows that they are not the same, and since he already knows that it gives problems, suggesting him to debug isn't going to solve the problem – Ferrybig Mar 16 '19 at 14:39
  • @JBNizet yes I know `cookies` is not equal to `cookies,`. I need to help with the logic so that my program takes it as the same and adds the one with the punctuation – Sandeep Ranjan Mar 16 '19 at 14:39
  • Possible duplicate of [How can I eliminate duplicate words from String in Java?](https://stackoverflow.com/questions/42770863/how-can-i-eliminate-duplicate-words-from-string-in-java) – GBrandt Mar 16 '19 at 14:40
  • 1
    What you need is called _[Tokenization](https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html)_. – Mike Mar 16 '19 at 14:40
  • @TiiJ7, there was a wrong usage of formatting — a code quote style for the text quote. – Mike Mar 16 '19 at 14:45
  • @MikeB. So what has bad formatting to do with changes the actual output text? You just duplicated the word "pancakes", making the expected output change, and invalidating the posted answer – Ferrybig Mar 16 '19 at 14:47
  • @Ferrybig, yes, it was my fault during the text formatting. Now everything is fixed. – Mike Mar 16 '19 at 15:17

4 Answers4

4

you can use regex to do this for you. sample code:

String regex = "\\b(\\w+)\\b\\s*(?=.*\\b\\1\\b)";
input = input.replaceAll(regex,"");
  1. \b Matches a word boundary position between a word character and non-word character or position (start / end of string).
  2. \w Matches any word character (alphanumeric & underscore).
  3. \b Matches a word boundary position between a word character and non-word character or position (start / end of string).
  4. \s Matches any whitespace character (spaces, tabs, line breaks).
  5. * Match 0 or more of the preceding token.
  6. (?= Matches a group after the main expression without including it in the result.
  7. . Matches any character except line breaks.
  8. \1 Matches the results of capture group #1 in step 2.

Note: It is important to use word boundaries here to avoid matching partial words.

Here's a link to regex demo and explaination : RegexDemo

Mustahsan
  • 3,852
  • 1
  • 18
  • 34
2

You should use a secondary variable to store your words without the punctuation.

String[] s = input.split(" ");
String ans = "";

for (int i = 0; i < s.length - 1; i++) {

    String currentAux = s[i].replaceAll("[,.!?]", "");
    String nextAux = s[i + 1].replaceAll("[,.!?]", "");

    if (nextAux.equals(currentAux)) {
        continue;
    }

    ans += " " + s[i];
}

ans += " " + s[s.length - 1];

System.out.println(ans);
Mike
  • 14,010
  • 29
  • 101
  • 161
lpinto.eu
  • 2,077
  • 4
  • 21
  • 45
  • For `"Hello how how how are are you you?"` it returns `Hello how are you`. `?` is missing – Sandeep Ranjan Mar 16 '19 at 15:10
  • 1
    @SandeepRanjan try it again now :) – lpinto.eu Mar 16 '19 at 15:27
  • This is a good answer, but I think you should add colon and semicolon in your calls to `replaceAll` because sentences like `"I had a huge meal meal; however, I I am already hungry again again."` will not be handled correctly - `meal` will appear twice. – D.B. Mar 16 '19 at 16:19
  • @D.B. we cal add a lot of symbols, the ones i used are the ones requested in the end of the question. – lpinto.eu Mar 16 '19 at 16:46
2

You can use java.util.StringTokenizer to tokenize the words. Make sure to set the delimiters to split the words. In your case they are spaces, commas and full stops. This can help you to split the words without the punctuation marks. Then you can compare the previous token with the current and if they are equal you can ignore it.

You can try this code snippet:

String s = "I love cookies cookies, apples and pancakes pancakes.";

StringTokenizer tokenizer = new StringTokenizer(s, " ,.", true);

List<String> duplicateRemovedTokenList = new LinkedList<>();

String prevToken = null;

while (tokenizer.hasMoreTokens()) {

    String currentToken = tokenizer.nextToken();

    if (currentToken.equals(" ")) {
        duplicateRemovedTokenList.add(currentToken);
        continue;
    }

    if (!currentToken.equals(prevToken)) {
        duplicateRemovedTokenList.add(currentToken);
        prevToken = currentToken;
    }
}

String duplicateRemovedString = StringUtils.join(duplicateRemovedTokenList, "");
Mike
  • 14,010
  • 29
  • 101
  • 161
LeoN
  • 1,590
  • 1
  • 11
  • 19
  • This has a few problems, it adds extra spaces and doesn't work with inputs like `"I love cookies, cookies, apples and pancakes pancakes."` (note the extra comma after the first `cookies`. – D.B. Mar 16 '19 at 16:10
0

If you are looking for a one liner, here is a Java 8 based solution

Stream.of(input.split(" ")).distinct().reduce((a, b) -> a + " " + b).orElse("")