2

Is this possible?

For a sentence such as hello how are you, I'd like my regular expression to return hello how are you. It only ever returns just hello and not the other words.

My Regex:

[A-Za-z]*

Any help is greatly appreciated. Thanks! If it matters, I'm using Pharo Smalltalk. I've been testing in too.

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
Dan
  • 1,163
  • 3
  • 14
  • 28

5 Answers5

7

Also in Pharo send the #substrings message:

'Hello how are you' substrings

and get the array:

#('Hello' 'how' 'are' 'you').
Leandro Caniglia
  • 14,495
  • 4
  • 29
  • 51
6

You can find a chapter about Regex in Pharo here:

https://ci.inria.fr/pharo-contribution/view/Books/job/DeepIntoPharo/lastSuccessfulBuild/artifact/tmp/PBE2.pdf

I you just want to split the string on the spaces you can just run:

Character space split: 'My String To split'

You will get an OrderedCollection with all the words.

Cyril Ferlicot
  • 317
  • 2
  • 7
5

If you only need split the sentence by spaces this can be done using string.Split() method:

var s = "hello how are you";
var words = s.Split();

If you want to use regular expressions:

var s = "hello how are you";
var regex = "\\w+";
var words = Regex.Matches(s, regex).Cast<Match>().Select(m => m.Value);
Arturo Menchaca
  • 15,783
  • 1
  • 29
  • 53
2

You don't need the Regex in this case at all. Simply use Split.

string str = "hello how are you";
string[] parts = str.Split(' ');

In case you really really want the Regex too much, \w+ as Regex captures any word. So in C# the Regex should look like this string regex = "\\w+" if you need at least word.

  • The \w stands for any word including characters as
  • The + quantifier stands for at least one time
  • The * quantifier stands for zero or more times
Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
2

The standard tries to match, which it doesn't because there are spaces

matcher := RxMatcher forString: '[A-Za-z]*'.
matcher matches: 'hello how are you'

false

If you ask for all matches it tells you there are 5, because * also matches zero characters

matcher := RxMatcher forString: '[A-Za-z]*'.
matcher matchesIn: 'hello how are you'

"an OrderedCollection('hello' 'how' 'are' 'you' '')"

And for the wanted result you might try

matcher := RxMatcher forString: '[A-Za-z]+'.
matcher matchesIn: 'hello how are you'

"an OrderedCollection('hello' 'how' 'are' 'you')"

and if you want to know how long the words are you can do

matcher := RxMatcher forString: '[A-Za-z]+'.
matcher matchesIn: 'hello how are you' collect: [ :each | each size ]

"an OrderedCollection(5 3 3 3)"    
Stephan Eggermont
  • 15,847
  • 1
  • 38
  • 65