0

this problem causes an infinite loop in the following code excerpt:

    public static final List<String> extractTags(String source, Integer nTags) {

    List<String> tags = new ArrayList<>();

    try (StringReader stringReader = new StringReader(source)) {
      String tag = "";
      char c;
      while ((c = (char) stringReader.read()) >= 0 && tags.size() < nTags) {
        switch (c) {
        case '<':
          tag = "";
          break;
        case '>':
          tags.add(tag);
          break;
        default:
          tag = tag + c;
          break;
        }
      }
    } catch (IOException e) {
    } finally {
      return tags;
    }
  }

if invoked with the following parameters: source = "trash" nTags = 2

Using a debugger I realized that after the string was completely iterated, the read() method returns the char '\uFFFF' 65535 forever. So my question is why?

Thanks!

João Matos
  • 6,102
  • 5
  • 41
  • 76
  • 1
    A `return` statement in a `finally` block is a [bad idea](http://stackoverflow.com/questions/48088/returning-from-a-finally-block-in-java). – Jesper Nov 16 '16 at 13:01
  • Why don't you use `String.charAt()` instead? – Klitos Kyriacou Nov 16 '16 at 13:29
  • why is String.charAt better? – João Matos Nov 16 '16 at 13:31
  • From an ease-of-use point of view, it's not necessarily better. It is, however, unusual to use StringReader to iterate a String. Probably for performance reasons. StringReader.read() is implemented by calling String.charAt(), inside a `synchronized` block. So calling charAt() directly bypasses this overhead. – Klitos Kyriacou Nov 17 '16 at 01:20

2 Answers2

4

Because stringReader.read() is returning -1 for end of stream, but you're casting it to char which is the only unsigned datatype in Java. So instead of -1 you're getting 65535 for end of stream therefore never breaking the while loop.

You might want to cast what is read into a char inside the loop, instead of at the while condition.

Kayaman
  • 72,141
  • 5
  • 83
  • 121
0

make it granular countTags(String source); => use this method to count tags only. extractTags(String source) => identify what your tags are or what your tags are not then extract tags or extract what tags arent.

rebuild the string without tags/ rebuild as you extract.. StringBuilder/StringReader are not necessary.

some interesting things: you can do string.length when you start & when you end and subtract to find number of tags in the extract method which would give you the count.

you also don't need a while loop for this

For your actual problem: you might want to look into characters that need to be escaped.

Timetrax
  • 1,373
  • 13
  • 15