Questions tagged [character-class]

Character classes are regular expression constructs that defines and matches from a list of meta- and literal characters. Use [regex-negation] for questions related to complementing character classes in regex.

Character classes are regular expression constructs that defines and matches from a list of meta-characters and literal characters in .

Read more:

99 questions
180
votes
2 answers

Regular expression \p{L} and \p{N}

I am new to regular expressions and have been given the following regular expression: (\p{L}|\p{N}|_|-|\.)* I know what * means and | means "or" and that \ escapes. But what I don't know what \p{L} and \p{N} means. I have searched Google for it,…
Diemauerdk
  • 5,238
  • 9
  • 40
  • 56
76
votes
5 answers

Pattern matching digits does not work in egrep?

Why can't I match the string "1234567-1234567890" with the given regular expression \d{7}-\d{10} with egrep from the shell like this: egrep \d{7}-\d{10} file ?
user377622
  • 865
  • 1
  • 6
  • 8
48
votes
1 answer

How can I exclude some characters from a class?

Say I want to match a "word" character (\w), but exclude "_", or match a whitespace character (\s), but exclude "\t". How can I do this?
planetp
  • 14,248
  • 20
  • 86
  • 160
29
votes
2 answers

Replace all characters not in range (Java String)

How do you replace all of the characters in a string that do not fit a criteria. I'm having trouble specifically with the NOT operator. Specifically, I'm trying to remove all characters that are not a digit, I've tried this so far: String number…
Chris Dutrow
  • 48,402
  • 65
  • 188
  • 258
29
votes
5 answers

Exclude characters from a character class

Is there a simple way to match all characters in a class except a certain set of them? For example if in a lanaguage where I can use \w to match the set of all unicode word characters, is there a way to just exclude a character like an underscore…
Dan Roberts
  • 4,664
  • 3
  • 34
  • 43
24
votes
2 answers

Why is a character class faster than alternation?

It seems that using a character class is faster than the alternation in an example like: [abc] vs (a|b|c) I have heard about it being recommended and with a simple test using Time::HiRes I verified it (~10 times slower). Also using (?:a|b|c) in case…
Jim
  • 18,826
  • 34
  • 135
  • 254
19
votes
1 answer

General approach for (equivalent of) "backreferences within character class"?

In Perl regexes, expressions like \1, \2, etc. are usually interpreted as "backreferences" to previously captured groups, but not so when the \1, \2, etc. appear within a character class. In the latter case, the \ is treated as an escape character…
kjo
  • 33,683
  • 52
  • 148
  • 265
17
votes
3 answers

Matching (e.g.) a Unicode letter with Java regexps

There are many questions and answers here on StackOverflow that assume a "letter" can be matched in a regexp by [a-zA-Z]. However with Unicode there are many more characters that most people would regard as a letter (all the Greek letters, Cyrllic…
The Archetypal Paul
  • 41,321
  • 20
  • 104
  • 134
14
votes
1 answer

Python: POSIX character class in regex?

How can I search for, say, a sequence of 10 isprint characters in a given string in Python? With GNU grep, I would simply do grep [[:print:]]{10}
nodakai
  • 7,773
  • 3
  • 30
  • 60
13
votes
3 answers

How to match Unicode vowels?

What character class or Unicode property will match any Unicode vowel in Perl? Wrong answer: [aeiouAEIOU]. (sermon here, item #24 in the laundry list) perluniprops mentions vowels only for Hangul and Indic scripts. Let's set aside the question what…
n.r.
  • 1,900
  • 15
  • 20
13
votes
3 answers

Character class subtraction, converting from Java syntax to RegexBuddy

Which regular expression engine does Java uses? In a tool like RegexBuddy if I use [a-z&&[^bc]] that expression in Java is good but in RegexBuddy it has not been understood. In fact it reports: Match a single character present in the list below…
xdevel2000
  • 20,780
  • 41
  • 129
  • 196
12
votes
8 answers

How to count uppercase and lowercase letters in a string?

yo, so im trying to make a program that can take string input from the user for instance: "ONCE UPON a time" and then report back how many upper and lowercase letters the string contains: output example: the string has 8 uppercase letters the…
yyzzer1234
  • 153
  • 1
  • 1
  • 7
10
votes
1 answer

Characters classes in ranges - vim

Given I have the following string: This is a test {{ string.string.string }}. And try to perform the following substitution: %s/{{ [\w\.]\+ }}/substitute/g It will not work with the error: Pattern not found. When I use: %s/{{ [a-zA-Z\.]\+…
BergmannF
  • 9,727
  • 3
  • 37
  • 37
9
votes
4 answers

Hyphen and underscore not compatible in sed

I'm having trouble getting sed to recognize both hyphen and underscore in its pattern string. Does anyone know why [a-z|A-Z|0-9|\-|_] in the following example works like [a-z|A-Z|0-9|_] ? $ cat /tmp/sed_undescore_hypen lkjdaslf lkjlsadjfl…
techie11
  • 1,243
  • 15
  • 30
7
votes
5 answers

What built-in regex character classes are supported Java

...when used in patterns like "\\p{someCharacterClass}". I've used/seen some: Lower Upper InCombiningDiacriticalMarks ASCII What is the definitive list of all supported built-in character classed? Where is it documented? What are the exact…
Bohemian
  • 412,405
  • 93
  • 575
  • 722
1
2 3 4 5 6 7