I'm trying to extract a people's names from text files, which I am reading line by line. With the way the file is structured, both the first and last name should almost always be on the same line and will be within the first few lines of the file. Currently, I search for the first name in an array of ~2300 names and then assume that the following word is the last name. My issue with my current approach is that it doesn't correctly match the names and thus may incorrectly identify a different word in the file as the name. For example, my name is Daniel, but the function skips over my name and recognizes Virginia (a word later in the file) as my first name. Am I doing anything wrong and is there a better way of doing this? I am pretty new to PHP, so chances are I'm making a silly mistake.
Clarifications: The file is a raw text file containing data that is extracted from pictures of resumes via OCR. For the purposes of my project, I am assuming that there is always a first & last name (no middle), and that both will be on the same line
$name = $this->search($line);
if (count($name) > 0 && empty($fname) && empty($lname)){
$fname = $name[0];
$lname = $name[1];
}
function search($str){ //$str is the current file line being read
require "utils".DIRECTORY_SEPARATOR."dictionary-first-names.php";
$arr = explode(" ", $str);
for ($i = 0; $i < count($arr); $i++){
if (in_array(mb_strtolower($arr[$i]), $dict)){
return array($arr[$i], $arr[$i+1]); //shouldn't have array out of bounds as first & last name should be on the same line
}
}
}
Here is a pastebin link to dictionary-first-names.php, since it's very long: https://pastebin.com/cRFkR4fh