0

I am working on something in MATLAB that processes text. For this I need my regular expression to work properly.

So far I have this regexp below, which works for almost everything, but I would like to add something to it so that it reads an apostrophe as a part of a word.

V1 = regexp(inpstr,'\w*[^a-zA-Z0-9\ _\ -\ "\ *\f\n\r\t\v\x20]?','match');

So, my an examplary question is: If I have a string:

'Hi, let's play some ball.'

I would like the regexp to give me 'Hi,' - 'let's' - 'play' - 'some' - 'ball.'

and currently it gives me 'Hi,' - 'let' - 's' - 'play' - 'some' - 'ball.'

I guess the problem is I can't just add \ ' to the regexp because of MATLABs use of '. I tried just adding it and this happened: ??? Error: File: TestScript.m Line: 13 Column: 38 The input character is not valid in MATLAB statements or expressions.

Any help would be greatly appreciated =)

Stewie Griffin
  • 14,889
  • 11
  • 39
  • 70
Frank
  • 1
  • 8

2 Answers2

1

try this

\w*[^a-zA-Z0-9\ _\ -\ '\ "\ *\f\n\r\t\v\x20]?
Rinku
  • 1,078
  • 6
  • 11
  • 1
    try to use \' ' instead of \' – Rinku Jun 11 '13 at 05:24
  • I found out that MATLAB has the apostrophe ' as char(39) (ascii character)... Now I need to find a way to use char(39) in the regexp so that it works. – Frank Jun 12 '13 at 15:03
  • When I find out how to use the char(x) command in the regexp command i can do something like regexp(inpstr,'\w*char(39)*\w*','match'), but I guess matlab does not recognize ascii in a text? ... I could use the . for 'any character' ... I'll try it with that. – Frank Jun 12 '13 at 15:19
0

The solution to my problem was this:

V1 = regexp(inpstr,'\w*[\'']*[^\_\-\"\*\s]*','match')

Basically, in between [ ] you put characters you want to express and in between [^ ] you put characters you want to skip. Also, \s was a shortcut for all whitespace.

Frank
  • 1
  • 8