0

The issue that I have is, I have a string input (s), which I then need to split by anything that is not (a-z or A-Z). The solution I've found so far is to split the String by empty spaces, but I've realised it will be more efficient to split by any non-alphabetical characters. This would then allow me to instantly output the array.

Question:

My question is, how do I split a string by any character in a string.

I basically need a version of this:

String[] inputSplit = input.split(" ", "!", ",", "?", ".", "_", "'", "@");

but it actually works without errors.

The code I've written so far isn't really relevant to the question, but I'll put it here anyway in case it helps:

import java.util.Arrays;
import java.util.Scanner;

public class StringTokens {
    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        String input = scan.nextLine();
        // Splits by " "
        String[] inputSplit = input.split(" ");
        // Counts the amount of tokens
        int n = inputSplit.length;

        System.out.println(n);
        for (int i = 0; i < inputSplit.length; i++) {
            // Prints each of the items in the array
            System.out.println(inputSplit[i]);
        }
    }
}
  • 1
    Regex's matching anything that is NOT a letter: `[^a-zA-Z]`, `\P{Alpha}`, `\P{L}`, `\P{javaLetter}`, `\P{javaAlphabetic}` – Andreas Jun 18 '21 at 00:22
  • @Andreas - Thank you, this was very helpful, are there any benefits to using any particular example? Or can I pretty much use `[^a-zA-Z]` for most use cases? –  Jun 18 '21 at 10:31
  • [`Alpha`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#posix) is the same as `[a-zA-Z]`, as long as flag [`(?U)`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS) (`UNICODE_CHARACTER_CLASS`) isn't specified. --- [`L`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#ucc) is all Unicode letters, including accented letters (`á`, `é`, `í`, `ó`, `ú`, `ü`, `ñ`, ...) and letters like the German double-S (`ß`). – Andreas Jun 18 '21 at 13:43
  • [`javaLetter`](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isLetter-int-) is the same as `L`. --- [`javaAlphabetic`](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isAlphabetic-int-) also includes "number letters", e.g. Roman numerals (`Ⅰ`, `Ⅱ`, `Ⅲ`, `Ⅳ`, ...). – Andreas Jun 18 '21 at 13:49

0 Answers0