0

I have this code to convert the whole text that is before "=" to uppercase

Matcher m = Pattern.compile("((?:^|\n).*?=)").matcher(conteudo);
while (m.find()) {
  conteudo = conteudo.replaceFirst(m.group(1), m.group(1).toUpperCase());
}

But when the string is too large, it becomes very slow, I want to find a faster way to do that.

Any sugestions?

EDIT

I haven't explained right. I have a text like this

field=value
field2=value2
field3=value3

And I want to convert each line like this

FIELD=value
FIELD2=value2
FIELD3=value3

4 Answers4

2

The fastest way to get regex to work fast is to not use regex. Regex was never meant to be and almost never is a good choice for performance-sensitive operations. (Further reading: Why are regular expressions so controversial?)

Try using String class methods instead, or write a custom method doing what you want. Use a tokenizer with split on '=', and then use .toUpperCase() on the tailing part (what's after \n). Alternatively, just convert to char[] or use charAt() and traverse it manually, switching chars to upper after a newline and back to regular way after '='.

For example:

public static String changeCase( String s ) {
    boolean capitalize = true;
    int len = s.length();
    char[] output = new char[len];
    for( int i = 0; i < len; i++ ) {
      char input = s.charAt(i);
      if ( input == '\n' ) {
        capitalize = true;
        output[i] = input;
      } else if ( input == '=' ) {
        capitalize = false;
        output[i] = input;
      } else {
        output[i] = capitalize ? Character.toUpperCase(input) : input;
      }
    }
    return new String(output);
}

Method input:

field=value\n
field2=value2\n
field3=value3

Method output:

FIELD=value\n
FIELD2=value2\n
FIELD3=value3

Try it here: http://ideone.com/k0p67j

PS (by Jamie Zawinski):

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Community
  • 1
  • 1
  • It should be noted that in many high-level languages (i.e. without a compiler) regexes are faster than your method. Just try this exact method in Python, PHP, Ruby, Perl, etc. and it will be faster to use a properly written regex instead. – Wolph Oct 31 '14 at 14:36
  • @Wolph while I agree with you in general (i.e. that using regex in interpreted languages is not necessarily slower than using hand-crafted string processing code), I'd still point out to two things: a) question was about Java performance explicitly, b) Regex was never meant to be and almost never is a good choice for performance-sensitive operations. Using interpreted languages is *never* a good choice for performance-sensitive operations, with the languages you mentioned having performance penalty of one-two orders of magnitude. As such, I consider my main point still valid. –  Oct 31 '14 at 14:49
  • nb I assume you don't mean "high-level languages" but "interpreted languages" per se - and, even then, the difference in favour of regex is from interpreting overhead, not from wrong algorithmic approach. In AOT-compiled languages, regex always incurs a performance penalty, the same way it's faster to have own, custom-tailored parser, than to use scanf with format string to parse the data, it's faster to use a native array for random-access data than to use a high-level data abstraction etc. –  Oct 31 '14 at 14:51
  • I meant high-level languages, the "without a compiler" part was meant as an example. I agree with you that regexes are not meant for performance btw, I'm just noting that it might still be the better option in some cases. – Wolph Oct 31 '14 at 14:59
  • It is not always true that regular expression-based solutions are slower than hand-crafted string processing: Advanced regex engines may implement super-fast search algorithms like Boyer-Moore-Holbrooke which could accelerate the processing drastically. Take a look at github.com/aunkrig/lfr . – Arno Unkrig Jul 29 '22 at 06:17
1

With a multiline regex we can simply get every line separately and replace it :)

String conteudo = "field=value\nfield2=value2\nfield3=value3";
Pattern pattern = Pattern.compile("^([^=]+=)(.*)$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(conteudo);
StringBuffer result = new StringBuffer();

while (matcher.find()) {
    matcher.appendReplacement(result, matcher.group(1).toUpperCase() + matcher.group(2));
}
System.out.println(conteudo);
System.out.println(result.toString());
Wolph
  • 78,177
  • 11
  • 137
  • 148
0

What about something like this? indexOf should be fast enough.

int equalsIdx = conteudo.indexOf('=');
String result = conteudo.substring(0, equalsIdx).toUpperCase() + conteudo.substring(equalsIdx, conteudo.length());
JiriS
  • 6,870
  • 5
  • 31
  • 40
0
((?:^|\n)[^=]*=)

Try this .

vks
  • 67,027
  • 10
  • 91
  • 124