0

I'm trying to parse a file and analyze it. To do this, I've used preg_split() to break the document into an array. I only want words in the array (otherwise alpha characters). The regular expression I used is:

$noAlpha = "/[\s]+|[^A-z]+|\W|\r/";

However, I'm getting instances of blanks in the array. I believe it has to do with a line with a return only (\r) and nothing else on it.

I'm only using .txt files. What would I need to add to the regex to account for this?

Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37
jKim83
  • 23
  • 1
  • 8

3 Answers3

2

To extract all the words (only letters), you can use this

preg_match_all('/[^\W\d_]+/',$string,$matches)

If you want digits as well, then the pattern should be '/[^\W_]+/'

ManzoorWani
  • 1,016
  • 7
  • 14
1

Try this: $noAlpha = "/\s+|[^a-zA-Z]+|\W|\r/";

1

You can try this:

$noAlpha = "/\s*\W\s*/";

However, I also would extract the words with preg_match_all instead.

csirmazbendeguz
  • 598
  • 4
  • 13