0

As I stated on the title. I'm try to find regex result on a specific word(like apple) having random newline(\r\n) special character.

Illustrate more detail...

Let's find a word 'apple' on the text file. but We don't know where is exact position of newline(\r\n) on the file like below...


                                                                           ap
ple

or

                                                                         appl
e

I also googled many pages but I couldn't find the answer.

Should I have to write beginner regex like below?

(a\r\npple|ap\r\nple|app\r\nle|appl\r\ne|apple\r\n|)

I need to find more smarter regex to find exact word.


updated.

the word can be vary like "ripe apple", "rotten apple" and "brightapple". In the case of third item, white space removed by writer.

updated

i have many txt files. i have to find the string within those. So remove /r/n is not useful and cannot handle(too much menory and time required).

qwertysigner
  • 1
  • 1
  • 2
  • Yes, you need to insert the pattern that may occur in between the letters. Maybe `a\s*p\s*p\s*l\s*e` – Wiktor Stribiżew Aug 01 '17 at 10:19
  • Can you be more specific? Do you mean you want to ignore all whitespace and capture the word? If not, how do you decide which lines are part of a previous word, and which are a new word entirely? See http://www.regular-expressions.info/tutorial.html for a brilliant and thorough treatment of regexes. Otherwise please include a code example so we can see the target capture. – Hektor Aug 01 '17 at 10:22
  • Wiktor Stribizew - surely a\s*?p\s*?p\s*?l\s*?e would be an improvement on that dreadful regex... – Hektor Aug 01 '17 at 10:24
  • 2
    @Hektor Why? What's the point of making the quantifiers non-greedy? – melpomene Aug 01 '17 at 10:30
  • thanks for everyone. I'm not make a program. just use a regex. and I also care about white space between character. I still find a possible ways.. – qwertysigner Aug 01 '17 at 10:37
  • @melpomene In case the word appears more than once in the text? – Hektor Aug 01 '17 at 10:39
  • @Hektor That makes no sense. Greediness doesn't affect whether a regex matches or where it matches. – melpomene Aug 01 '17 at 10:40
  • @Hektor With `a\s*?p` the `?` really does not change anything. Since `p` is not in `\s`, there is no ambiguity that `?` would resolve. other than e.g. with `a.*?p`. – tobias_k Aug 01 '17 at 10:46
  • @melpomene Surely a greedy regex will match the longest possible string, even encompassing numerous instances of the match within that string if necessary? – Hektor Aug 01 '17 at 10:47
  • Actually my mistake, in this context it would make no difference...whatsoever! – Hektor Aug 01 '17 at 10:48
  • I'd recommend simply stripping all [\r\n] and then look for the word. Clean your data up (with a regex) and then search for your term (with another regex) – sniperd Aug 01 '17 at 12:44
  • sniperd , my fault... your comment is true, but I have thousand of text files which have to find words. It's not a case your mentioned. thanks – qwertysigner Aug 01 '17 at 14:30

1 Answers1

0

You have to take the naive approach ("beginner regex") if you want to use regular expressions, since they belong to the type 3 grammars and cannot express the state needed (see also The difference between Chomsky type 3 and Chomsky type 2 grammar)

Laurent P
  • 36
  • 2