-2

EDIT: Previously marked as duplicated. But the reporters seems don't understand my question.

I have the string data like

Aabc123def457ghi123jkl950asd489are - has no space or character for splitting
AB950asd489are
ABC950asd489are

and I want to split--EDIT:not just splitting each string by a character or string, since the splitting that I need was between alpha and numeric characters--those into arrays of strings like this

"Aabc123def457ghi123jkl950asd489are" => [A,abc,123,def,457,ghi,123,jkl,950,asd,489,are] (can we use String.split method? nope)
"AB950asd489are" => [AB,950,asd,489,are]
"ABC950asd489are" => [ABC,950,asd,489,are]

Just like a currency formatter with comma , and split it into array. I need to find a regex for that, or is there any way to do that?

General Grievance
  • 4,555
  • 31
  • 31
  • 45
Morilla Thaisa
  • 121
  • 1
  • 1
  • 10

4 Answers4

3

This should suit your needs (demo):

(?<=[A-Z])(?=[^A-Z])|(?<=[a-z])(?=[^a-z])|(?<=[0-9])(?=[^0-9])

(?<=[A-Z])(?=[^A-Z]) means "any inter-char preceded by an uppercased char followed by any char but an uppercased one".

The same logic is applied for lowercased char and numbers.

sp00m
  • 47,968
  • 31
  • 142
  • 252
  • Yeah, the regex fullfills my need. I just know basic javascript's regexp. I've tried `/([A-Za-z0-9]{3})*([A-Za-z0-9]{3})$/i` and just correct for last two characters.. Using `/([A-Za-z0-9]{3})*([A-Za-z0-9]{3})*([A-Za-z0-9]{3})$/i`, just last three characters.. Thanks for the help.. – Morilla Thaisa May 29 '13 at 07:44
  • @MorillaThaisa But you tagged your question with Java! This regex won't work with JavaScript since lookbehinds aren't supported by its regex flavor... But I'm glad it could help you though :) – sp00m May 29 '13 at 08:50
  • I'm working on android, my previous trial is using `java.text.NumberFormat`. That's why tagged it with java.. – Morilla Thaisa May 29 '13 at 10:11
0

Have you tried something?

You can split() your strings when a char is uppercase or a digit using isUpperCase(char c) and Character.isDigit(char c) in a for cycle.

Massimo Variolo
  • 4,669
  • 6
  • 38
  • 64
0

try this line:

s.split("(?<![a-z])(?=[a-z])|(?<=[a-z])(?![a-z])")

with your example, it outputs:

String s = "Aabc123def457ghi123jkl950asd489are";
System.out.println(Arrays.toString(s.split("(?<![a-z])(?=[a-z])|(?<=[a-z])(?![a-z])")));

[A, abc, 123, def, 457, ghi, 123, jkl, 950, asd, 489, are]

If I read your question title twice:

How to split string into array of three characters

it could be:

s.split("(?=[a-z]{3})|(?<=[a-z]{3})")

output is the same

Kent
  • 189,393
  • 32
  • 233
  • 301
  • This regex doesn't seem to work for the 2nd and 3rd line in the sample text provided in the question. You get AB950 returned for the second line and ABC950 for the third. – Francis Gagnon May 28 '13 at 12:27
0

Java regex code

String regex =
       "(?<=[A-Z])(?![A-Z])|(?<=[a-z])(?![a-z])|(?<=[0-9])(?![0-9])";
System.out.println(
    Arrays.toString(
        "Aabc123def457ghi123jkl950asd489are".split(regex)));
System.out.println(
    Arrays.toString("AB950asd489are".split(regex)));
System.out.println(
    Arrays.toString("ABC950asd489are".split(regex)));

Output

[A, abc, 123, def, 457, ghi, 123, jkl, 950, asd, 489, are]
[AB, 950, asd, 489, are]
[ABC, 950, asd, 489, are]


Improving Performance

If you're supposed to be applying this regex parsing in a lopp for a lot many of such input strings it's not recommended to use String.split(). Why? Because split() compiles the regex on every call even when your regex hasn't changed a bit. Internally, it works somewhat as

Pattern.compile(regex).split(strInput);

So, to improve our performance we can pre-compile the regex once and then split as many number of times as we like without the added overhead of compilation at every split() now.

Pattern regex = Pattern.compile(
                "(?<=[A-Z])(?![A-Z])|(?<=[a-z])(?![a-z])|(?<=[0-9])(?![0-9])");

String[] input = {"Aabc123def457ghi123jkl950asd489are",
                  "AB950asd489are", "ABC950asd489are"};

for (String strInput : input)
    System.out.println (Arrays.toString (regex.split (strInput)));
Ravi K Thapliyal
  • 51,095
  • 9
  • 76
  • 89
  • The regexp works although it's different from sp00m (The accepted answer, coz that the first answer I saw and tested). Does regexp form affect the performance? Thanks for the help anyway.. :) – Morilla Thaisa May 29 '13 at 07:55
  • Yes, regex is one of those things that can be written in different ways and yes that means a badly written regex (although working) could mess up the performance. Please, check my edit on some more pointers on how to improve performance when doing regexs in Java. – Ravi K Thapliyal May 29 '13 at 13:11