Ruby - how to pull out that is betweet two "points"?

Question

I have a text like this:

...
Sentence one. hsjdhsd jghdsjghjdskhgjksdh kjghdsjkg

sdgsdg
dgds
hfdhdf
h
fdh
dfh Sentence two. gdjshagjhsdga sdgjhsdkjgh adskjghdsa
gs a
gfdgfdhfdhh
...

And I would need to pull from this paragraph the text that is between strings (actually it's a sentence) Sentence one. and Sentence two..

Could you help me guys, please, how to pull it?

Thanks

I doubt you'll be able to differentiate an arbitrary real sentence verses gibberish with a reasonable regular expression. Some kind of simple parser is probably going to be your best bet. — AndyPerfect, May 21 '13 at 16:31
`/Sentence one(.*?)Sentence two/m` will work, but only if `Sentence one` and `Sentence two` are exact and not nested. — Explosion Pills, May 21 '13 at 16:34

score 1 · Answer 1 · answered May 21 '13 at 16:35

Looking at what you have, the start and end of your sentence are a capital letter and a period, respectively. You can construct a regular expression that pulls out the text between a capital letter and the first period that comes after.

But this may be a contrived example; it looks like you may have types random keys in the middle of the keyboard, so this may not be the characteristics of your actual gibberish.

score 1 · Answer 2 · answered May 21 '13 at 17:46

1

Try something like this([A-Z]{1}.*\.)?

answered May 21 '13 at 17:46

Lifeweaver

986
8
29

score 0 · Answer 3 · answered May 21 '13 at 22:46

Use a Conditional Flip-Flop Expression

Given your corpus as defined above:

ruby -ne 'puts $_ if /Sentence/ ... /Sentence/' /tmp/corpus

will output:

Sentence one. hsjdhsd jghdsjghjdskhgjksdh kjghdsjkg

sdgsdg
dgds
hfdhdf
h
fdh
dfh Sentence two. gdjshagjhsdga sdgjhsdkjgh adskjghdsa

Ruby - how to pull out that is betweet two "points"?

3 Answers3

Use a Conditional Flip-Flop Expression