1

I am creating a Bible search. The trouble with bible searches is that people often enter different kinds of searches, and I need to split them up accordingly. So i figured the best way to start out would be to remove all spaces, and work through the string there. Different types of searches could be:

Genesis 1:1 - Genesis Chapter 1, Verse 1

1 Kings 2:5 - 1 Kings Chapter 2, Verse 5

Job 3 - Job Chapter 3

Romans 8:1-7 - Romans Chapter 8 Verses 1 to 7

1 John 5:6-11 - 1 John Chapter 5 Verses 6 - 11.

I am not too phased by the different types of searches, But If anyone can find a simpler way to do this or know's of a great way to do this then please tell me how!

Thanks

Chud37
  • 4,907
  • 13
  • 64
  • 116

5 Answers5

1

The easiest thing to do here is to write a regular expression to capture the text, then parse out the captures to see what you got. To start, lets assume you have your test bench:

$tests = array( 
    'Genesis 1:1' => 'Genesis Chapter 1, Verse 1',
    '1 Kings 2:5' => '1 Kings Chapter 2, Verse 5',
    'Job 3' => 'Job Chapter 3',
    'Romans 8:1-7' => 'Romans Chapter 8, Verses 1 to 7',
    '1 John 5:6-11' => '1 John Chapter 5, Verses 6 to 11'
);

So, you have, from left to right:

  1. A book name, optionally prefixed with a number
  2. A chapter number
  3. A verse number, optional, optionally followed by a range.

So, we can write a regex to match all of those cases:

((?:\d+\s)?\w+)\s+(\d+)(?::(\d+(?:-\d+)?))?

And now see what we get back from the regex:

foreach( $tests as $test => $answer) {
    // Match the regex against the test case
    preg_match( $regex, $test, $match);

    // Ignore the first entry, the 2nd and 3rd entries hold the book and chapter
    list( , $book, $chapter) = array_map( 'trim', $match);

    $output = "$book Chapter $chapter";

    // If the fourth match exists, we have a verse entry
    if( isset( $match[3])) {
        // If there is no dash, it's a single verse
        if( strpos( $match[3], '-') === false) {
            $output .= ", Verse " . $match[3];
        } else {
            // Otherwise it's a range of verses
            list( $start, $end) = explode( '-', $match[3]);
            $output .= ", Verses $start to $end";
        }
    }
    // Here $output matches the value in $answer from our test cases
    echo $answer . "\n" . $output . "\n\n";
}

You can see it working in this demo.

nickb
  • 59,313
  • 13
  • 108
  • 143
0

I think I understand what you are asking here. You want to devise an algorithm that extracts information (ex. book name, chapter, verse/verses).

This looks to me like a job for pattern matching (ex. regular expressions) because you could then define patterns, extract data for all scenario's that make sense and work from there.

There are actually quite a few variants that could exist - perhaps you should also take a look at natural language processing. Fuzzy string matching on names could provide better results (ex. people misspelling book names).

Best of luck

Shelakel
  • 1,070
  • 9
  • 16
0

Try out something based on preg_match_all, like:

$ php -a
Interactive shell

php > $s = '1 kings 2:4 and 1 sam 4-5';
php > preg_match_all("/(\\d*|[^\\d ]*| *)/", $s, $parts);
php > print serialize($s);
Don
  • 4,583
  • 1
  • 26
  • 33
0

Okay Well I am not too sure about regular expressions and I havent yet studied them out, So I am stuck with the more procedural approach. I have made the following (which is still a huge improvement on the code I wrote 5 years ago, which was what I was aiming to achieve) That seems to work flawlessly:

You need this function first of all:

    function varType($str) {
        if(is_numeric($str)) {return false;}    
        if(is_string($str)) {return true;}  
    }


    $bible = array("BookNumber" => "", "Book" => "", "Chapter" => "", "StartVerse" => "", "EndVerse" => "");    
  $pos = 1; // 1 - Book Number
        // 2 - Book 
        // 3 - Chapter 
        // 4 - ':' or 'v'
        // 5 - StartVerse
        // 6 - is a dash for spanning verses '-'
        // 7 - EndVerse
    $scan = ""; $compile = array();
    //Divide into character type groups.    
    for($x=0;$x<=(strlen($collapse)-1);$x++)
    {   if($x>=1) {if(varType($collapse[$x]) != varType($collapse[$x-1])) {array_push($compile,$scan);$scan = "";}}
        $scan .= $collapse[$x];
        if($x==strlen($collapse)-1) {array_push($compile,$scan);}
    }
    //If the first element is not a number, then it is not a numbered book (AKA 1 John, 2 Kings), So move the position forward.
    if(varType($compile[0])) {$pos=2;}
    foreach($compile as $val)
    {   if(!varType($val)) 
        {   switch($pos) 
            {   case 1: $bible['BookNumber'] = $val;    break;      
                case 3: $bible['Chapter'] = $val;   break; 
                case 5: $bible['StartVerse'] = $val;    break; 
                case 7: $bible['EndVerse'] = $val;  break; 
            }
        } else {switch($pos) 
            {   case 2: $bible['Book'] = $val;      break;      
                case 4:     //Colon or 'v'
                case 6: break;  //Dash for verse spanning. 
            }}
        $pos++;
    }

This will give you an array called 'Bible' at the end that will have all the necessary data within to run on an SQL database or whatever else you might want it for. Hope this helps others.

Chud37
  • 4,907
  • 13
  • 64
  • 116
0

I know this is crazy talk, but why not just have a form with 4 fields so they can specify:

  1. Book
  2. Chapter
  3. Starting Verse
  4. Ending Verse [optional]
Sammitch
  • 30,782
  • 7
  • 50
  • 77
  • That thar be crazy talk! But it's just not what the big websites do and therefore I wouldnt want to do that. Plus many, many users are not aware of the Tab key and it slows them down. – Chud37 Oct 30 '12 at 15:10