1

I have a code to remove duplicate words from a string. Lets say i have:

This is serious serious work. I apply the code and get: This is serious work

This is the code:

 return Arrays.stream(input.split(" ")).distinct().collect(Collectors.joining(" "));

Now i want to add new constraints that is if the string/line is longer than 78 characters, break and indent it where it makes sense so the line does not run longer than 78 characters. Example:

This one is a very long line that runs off the right side because it is longer than 78 characters long

It should then be

This one is a very long line that runs off the right side because it is longer 
  than 78 characters long

I cant find a solution to this. It was brought to my attention that there is a possible duplicate to my question. I cant find my answer there. I need to be able to indent.

Devee
  • 61
  • 9
  • Why not use somthing like: txt.subString(0,77) + "\n " + txt.subString(78); ? – dorony Nov 11 '18 at 23:45
  • @dorony That might split in the middle of a word – GBlodgett Nov 11 '18 at 23:50
  • Possible duplicate of [Wrap the string after a number of characters word-wise in Java](https://stackoverflow.com/questions/4212675/wrap-the-string-after-a-number-of-characters-word-wise-in-java) – azro Nov 11 '18 at 23:54

2 Answers2

1

You could create a StringBuilder off of the String and then insert a newline and tab at the last word break after 78 characters. You can find the last word break to insert the newline/tab by getting the substring of the first 78 characters, and then finding the index of the last space:

StringBuilder sb = new StringBuilder(Arrays.stream(input.split(" ")).distinct().collect(Collectors.joining(" ")));
if(sb.length() > 78) {
    int lastWordBreak = sb.substring(0, 78).lastIndexOf(" ");        
    sb.insert(lastWordBreak , "\n\t");
}
return sb.toString();

Output:

This one is a very long line that runs off the right side because it longer
     than 78 characters

Also your Stream does not do what you want it to. Yes it removes duplicate words but.. it removes duplicate words. So for the String:

This is a great sentence. It is a great example.

It would remove the duplicate is, great and a, and return

This is a great sentence. It example.

To only remove consecutive duplicate words you can look at the following solution:

Alternatively you could create your own them by splitting the text into words, and comparing the current element to the one ahead of it to remove the consecutive duplicate words

GBlodgett
  • 12,704
  • 4
  • 31
  • 45
  • Thank you very much for your help. But for some reason your solution seems to be cutting off the "is" in the original string – Devee Nov 12 '18 at 00:07
  • 1
    @Devee That is because in the `Stream` you created you only allow one of each word. If you only want to remove consecutive duplicate words you will have to change the `Stream` – GBlodgett Nov 12 '18 at 02:08
0

Instead of using

Collectors.joining(" ")

it is possible to write a custom collector that adds new lines and indentation at proper places.

Let's introduce a LineWrapper class, which contains indent and limit fields:

public class LineWrapper {

  private final int limit;
  private final String indent;

The default constructor sets the fields to reasonable default values. Note how the indent starts with a new line character.

  public LineWrapper() {
    limit = 78;
    indent = "\n  ";
  }

A custom constructor allows the client to specify limit and indent:

  public LineWrapper(int limit, String indent) {
    if (limit <= 0) {
      throw new IllegalArgumentException("limit");
    }
    if (indent == null || !indent.matches("\\n *")) {
      throw new IllegalArgumentException("indent");
    }
    this.limit = limit;
    this.indent = indent;
  }

Following is a regex used to split the input around one or more spaces. This makes sure that the split will not produce empty Strings:

private static final String SPACES = " +";

The apply method splits the input and collects the words into lines of the specified maximum length, indents the lines and removes duplicate consecutive words. Note how duplicates are not removed using the Stream.distinct method, since it also removes duplicates that are not consecutive.

public String apply(String input) {
    return Arrays.stream(input.split(SPACES)).collect(toWrappedString());
  }

The toWrappedString method returns a collector that accumulates the words in a new ArrayList, and uses the following methods:

  • addIfDistinct: to add the words to the ArrayList
  • combine: to merge two array lists
  • wrap: to split and indent the lines

.

Collector<String, ArrayList<String>, String> toWrappedString() {
    return Collector.of(ArrayList::new, 
                        this::addIfDistinct, 
                        this::combine, 
                        this::wrap);
  }

The addIfDistinct adds the word to the accumulator ArrayList if it is different than the previous word.

void addIfDistinct(ArrayList<String> accumulator, String word) {
    if (!accumulator.isEmpty()) {
      String lastWord = accumulator.get(accumulator.size() - 1);
      if (!lastWord.equals(word)) {
        accumulator.add(word);
      }
    } else {
      accumulator.add(word);
    }
  }

The combine method adds all words from the second ArrayList to the first one. It also makes sure that the first word of the second ArrayList does not duplicate the last word of the first ArrayList.

ArrayList<String> combine(ArrayList<String> words, 
                          ArrayList<String> moreWords) {
    List<String> other = moreWords;
    if (!words.isEmpty() && !other.isEmpty()) {
      String lastWord = words.get(words.size() - 1);
      if (lastWord.equals(other.get(0))) {
        other = other.subList(1, other.size());
      }
    }
    words.addAll(other);
    return words;
  }

Finally the wrap method appends all words to a StringBuffer, inserting the indent when the line length limit is reached:

String wrap(ArrayList<String> words) {
    StringBuilder result = new StringBuilder();

    if (!words.isEmpty()) {
      String firstWord = words.get(0);
      result.append(firstWord);
      int lineLength = firstWord.length();

      for (String word : words.subList(1, words.size())) {
        //add 1 to the word length,
        //to account for the space character
        int len = word.length() + 1;
        if (lineLength + len <= limit) {
          result.append(' ');
          result.append(word);
          lineLength += len;
        } else {
          result.append(indent);
          result.append(word);
          //subtract 1 from the indent length,
          //because the new line does not count
          lineLength = indent.length() - 1 + word.length();
        }
      }
    }

    return result.toString();
  }
Pietro Boido
  • 186
  • 5