-1

I have a text like this:

...
Sentence one. hsjdhsd jghdsjghjdskhgjksdh kjghdsjkg

sdgsdg
dgds
hfdhdf
h
fdh
dfh Sentence two. gdjshagjhsdga sdgjhsdkjgh adskjghdsa
gs a
gfdgfdhfdhh
...

And I would need to pull from this paragraph the text that is between strings (actually it's a sentence) Sentence one. and Sentence two..

Could you help me guys, please, how to pull it?

Thanks

user984621
  • 46,344
  • 73
  • 224
  • 412
  • 3
    You didn't include what you have so far? – Jerry May 21 '13 at 16:27
  • I doubt you'll be able to differentiate an arbitrary real sentence verses gibberish with a reasonable regular expression. Some kind of simple parser is probably going to be your best bet. – AndyPerfect May 21 '13 at 16:31
  • 1
    `/Sentence one(.*?)Sentence two/m` will work, but only if `Sentence one` and `Sentence two` are exact and not nested. – Explosion Pills May 21 '13 at 16:34

3 Answers3

1

Looking at what you have, the start and end of your sentence are a capital letter and a period, respectively. You can construct a regular expression that pulls out the text between a capital letter and the first period that comes after.

But this may be a contrived example; it looks like you may have types random keys in the middle of the keyboard, so this may not be the characteristics of your actual gibberish.

John
  • 15,990
  • 10
  • 70
  • 110
1

Try something like this([A-Z]{1}.*\.)?

Lifeweaver
  • 986
  • 8
  • 29
0

Use a Conditional Flip-Flop Expression

Given your corpus as defined above:

ruby -ne 'puts $_ if /Sentence/ ... /Sentence/' /tmp/corpus

will output:

Sentence one. hsjdhsd jghdsjghjdskhgjksdh kjghdsjkg

sdgsdg
dgds
hfdhdf
h
fdh
dfh Sentence two. gdjshagjhsdga sdgjhsdkjgh adskjghdsa
Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199