0

in a text file I have Chapters and Verses and I need to extract the chapter numbers and verse numbers. The format for the chapters is ["CHAPTER "] [number] The format for the verses is [number] [text] I have a regular expression which now prints out the chapter number in document once a word is searched as if the word is contained within that chapter however, for the verse part it prints out the entire verse and now just the number. I need the number of the verse not the text.

String patt = "((?<chapter>CHAPTER\\s\\d{1,3}) (?<verse>\\d{1,3})(?<verseText>.*))|(^(?<verse2>\\d{1,3})(?<verseText2>.*))";

How would I extend this to have it search for the number of the verse instead of the text, the verses are listed per line in a text document and the number of the verse is at the beginning of each line. Thanks for the help.

Hydra
  • 3
  • 2
  • `"^(?:\\[(\\w*)\\]\\[(\\d*)\\]|\\[(\\d*)\\]\\[(\\w*)\\])$"` – Avinash Raj Dec 10 '14 at 16:37
  • Is your chapter always a single word? What i mean to say is, is an example string of what your are matching something like: Chapter 6 Verse 12 Or is it something where what is in the Chapter spot can be different words and/or multiple words like The Epic Chapter 6 Verse 12 – Daileyo Dec 10 '14 at 16:40
  • Thank Avinash, would it possible to get the a layout of the code to get it output to the user with declarations. No it is not, the file has the title at the top, followed by the chapters which looks a bit like this: CHAPTER 1 1 (SOME TEXT) 2 (SOME TEXT) 3 (SOME TEXT) and so on... CHAPTER 2 1 (SOME TEXT) 2 (SOME TEXT) each verse is on a different line – Hydra Dec 10 '14 at 16:54

1 Answers1

1

You could do something like this:

(?'Chapter'\w* ){1,3}(?'chapter_number'\d{1,3}) (?'Verse'\w*){1} (?'verse_number'\d){1,3}

You probably don't need to worry about doing a general match on the chapter and verse, since it sounds like you know they will always be the same word(s) As such you could simplify the above to be:

(?'chapter'CHAPTER \d{1,3}) (?'Verse'\d{1,3})

The labels give you a means of decifering between the number, and the ranges allow you to be explicit on how many digits the numbers match to.

Update

If you are needing it to match the CHAPTER 1 1 (some text) Or the 2 (some text) scenarios you could also do this:

((?'chapter'CHAPTER \d{1,3}) (?'verse'\d{1,3})(?'verse_text2'.*))|(^(?'verse2'\d{1,3})(?'verse_text'.*))

You can try these out here. I find the site to be helpful at times for doing sanity checks.

Since you are working with Java, this site could be more helpful to you.

There are some syntax differences with group naming in java. This stack overflow answer is quite nice for calling out the use and some of the limitations.

Last edit to show an example that is a little more Java compliant. Try it on the RexexPlanet site.

((?<chapter>CHAPTER \d{1,3}) (?<verse>\d{1,3})(?<verseText>.*))|(^(?<verse2>\d{1,3})(?<verseText2>.*))

I used the following for my test input.

The Book About Old Moldy Cheese    

CHAPTER 1 1 The chease is old and moldy.
2 No it isn't
3 Yes it is
4 No it isn't
5 I said, yes it is.  
Yes it is. Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.   Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is.  Yes it is. 
6  Lame story

I hope this helps.

Community
  • 1
  • 1
Daileyo
  • 710
  • 5
  • 14
  • 1
    Thanks, how would this be implemented?.. I tried to implement it using pattern, matcher, .find() method followed by .group() for outputs and yet it doesn't seem to be working. Also the layout of the text files is as follow: (Name of Book) CHAPTER 1 (On a separate line) followed by 1 (text) 2 (text) each verse on a different line. and this continues for several chapters and verses. – Hydra Dec 10 '14 at 18:19
  • I'm not a Java guru, so I may not be as much of a help there. How you implement it though, depends on what you need. Just making the above work in Java, you'd need to change the syntax to be Java compliant. Then you'd also need to account for whether or not you want to match multiline and what not. Try my latest edit on the regexplanet page. I'm not really sure if the match it giving you what you need in terms of an end result... but it does give you an idea of how to get there hopefully. – Daileyo Dec 10 '14 at 18:40
  • 1
    Thanks a lot for the help, much appreciated!.. unfortunately, I keep getting null values output instead. – Hydra Dec 10 '14 at 22:38