What is the regular expression for all non-alphabetic characters in php?

Question

I'm trying to parse a file and analyze it. To do this, I've used preg_split() to break the document into an array. I only want words in the array (otherwise alpha characters). The regular expression I used is:

$noAlpha = "/[\s]+|[^A-z]+|\W|\r/";

However, I'm getting instances of blanks in the array. I believe it has to do with a line with a return only (\r) and nothing else on it.

I'm only using .txt files. What would I need to add to the regex to account for this?

You may just extract the words with `preg_match_all('~\p{L}+~', $text, $words)` — Wiktor Stribiżew, Mar 14 '17 at 14:57
`[^A-z]` likely doesn't match what you expect: http://stackoverflow.com/questions/4923380/difference-between-regex-a-z-and-a-za-z — Sebastian Proske, Mar 14 '17 at 14:58
Thank you for the comments, they helped me become aware of some details I didn't consider — jKim83, Mar 14 '17 at 15:52

score 2 · Accepted Answer · answered Mar 14 '17 at 15:19

2

To extract all the words (only letters), you can use this

preg_match_all('/[^\W\d_]+/',$string,$matches)

If you want digits as well, then the pattern should be '/[^\W_]+/'

answered Mar 14 '17 at 15:19

ManzoorWani

1,016
7
14

score 1 · Answer 2 · answered Mar 14 '17 at 15:12

1

Try this: $noAlpha = "/\s+|[^a-zA-Z]+|\W|\r/";

answered Mar 14 '17 at 15:12

Harm van der Wal

102
5

score 1 · Answer 3 · answered Mar 14 '17 at 15:14

1

You can try this:

$noAlpha = "/\s*\W\s*/";

However, I also would extract the words with preg_match_all instead.

answered Mar 14 '17 at 15:14

csirmazbendeguz

598
4
13

Thanks, I ended up using preg_match_all – jKim83 Mar 14 '17 at 15:52

What is the regular expression for all non-alphabetic characters in php?

3 Answers3