Regex [a-zA-Z0-9] is not work?

Question

I'm writing some codes on php. Now I want to grap words which include only alphabetic and numeric character. But I can't. I am writing my codes here. In actualy I found this regex in this site (Allow only [a-z][A-Z][0-9] in string using PHP).

$fp = fopen('C:\wamp\www\curl\5510.doc','w');
fwrite($fp, $data); 
fclose($fp);
$file = doc2text('C:\wamp\www\curl\5510.doc');
@preg_match_all("/^[a-zA-Z0-9]+$/", file_get_contents($file), $fileOnlyAlphabetic);
print_r($fileOnlyAlphabetic);

And result is here

Array ( [0] => Array ( ) [1] => Array ( ) )

Please help me ...

What is the string that should be matched in your opinion but does not get matched? — jedrzej.kurylo, Aug 06 '15 at 07:54
But when I write write word side by side, it's not work. @Tushar — Muhammed Yusuf Taşkesenligil, Aug 06 '15 at 07:58
@jedrzej.kurylo there is the string " sosyal sigortalar ve genel salik sigortasi kanunu kabul tarihi 3i 5 2ii6 yayimlandii r gazete tarih i6 6 2ii6 sayi 262ii yayimlandii dstur tertip 5 cilt 45 birinci kisim madde i bu kanunun amaci sosyal sigortalar ile genel salik sigortasi bakimindan" — Muhammed Yusuf Taşkesenligil, Aug 06 '15 at 08:00
@MuhammedYusufTaşkesenligil: Post the full `doc2text` code if `preg_match_all("/[a-z0-9]+/i", $file, $fileOnlyAlphabetic);` does not work. Also please post a part of what `echo $file;` prints. — Wiktor Stribiżew, Aug 06 '15 at 08:51

castarco · Answer 1 · 2015-08-06T10:13:00.880

1

First point, you should avoid writing the '@' symbol in front of the preg_match_all call, because then you're hidding potential errors.

Secondly, is probable that a .doc file doesn't have any line with only alphanumeric characters, without spaces, without punctuation symbols... or without non-printable symbols. So, the code is running OK, but you aren't using a good pattern.

You should remove the $ character from the regexp, and also the ^ character. The ^ character is to indicate a line start, and the $ character is to indicate a line end.

Is also probable that doc2text returns you the file content, not its name or file descriptor, so you should also remove the file_get_contents call made inside the preg_match_all.

Try with something like

$fp = fopen('C:\wamp\www\curl\5510.doc','w');
fwrite($fp, $data); 
fclose($fp);
$file = doc2text('C:\wamp\www\curl\5510.doc');
preg_match_all("/[a-zA-Z0-9]+/", $file, $fileOnlyAlphabetic);
print_r($fileOnlyAlphabetic);

Hope it helps.

edited Aug 06 '15 at 10:13

answered Aug 06 '15 at 07:55

castarco

1,368
2
17
33

Thank for your comment but result didn't change – Muhammed Yusuf Taşkesenligil Aug 06 '15 at 08:08
Where you found the doc2text function? – castarco Aug 06 '15 at 08:10
I think because the doc file code is not working correctly – Muhammed Yusuf Taşkesenligil Aug 06 '15 at 08:10
My job friend wrote doc2text function @castarco – Muhammed Yusuf Taşkesenligil Aug 06 '15 at 08:11
I think the errors are in the previous code, from the fopen statement to the doc2text statement. I don't know why are you opening and closing files... in any case, are you sure that $file is really a file? – castarco Aug 06 '15 at 08:13
There is no error in the previous code I am sure, and when I write 'echo $file;' , I can see result on the page – Muhammed Yusuf Taşkesenligil Aug 06 '15 at 08:18
Ok, then there is an error: $file has the contents, not the filename, so when you type file_get_contents($file), you obtain nothing, you should leave the $file variable without the file_get_contents call in the preg_match_all call. – castarco Aug 06 '15 at 08:19
Yes that's right, I tried without file_get_content but result didn't change – Muhammed Yusuf Taşkesenligil Aug 06 '15 at 08:25
1

Please, then provide more info. Modify the post to add the result of the echo calls, add some debug info or it will be impossible to help you. – castarco Aug 06 '15 at 08:26
What was the problem? – castarco Aug 06 '15 at 10:11

Muhammed Yusuf Taşkesenligil · Answer 2 · 2015-08-06T10:19:01.013

@stribizhev Here is the code you want

function doc2text($userDoc) {
 $fileHandle = fopen($userDoc, 'r');
 $word_text = @fread($fileHandle, filesize($userDoc));
 $line = "";
 $tam = filesize($userDoc);
 $nulos = 0;
 $caracteres = 0;
 for($i=1536; $i<$tam; $i++)
 {
    $line .= $word_text[$i];

    if( $word_text[$i] == 0)
    {
        $nulos++;
    }
    else
    {
        $nulos=0;
        $caracteres++;
    }

    if( $nulos>1996)
    {   
        break;  
    }
}

//echo $caracteres;

$lines = explode(chr(0x0D),$line);
//$outtext = "<pre>";

$outtext = "";
foreach($lines as $thisline)
{

    $tam = strlen($thisline);
    if( !$tam )
    {
        continue;
    }

    $new_line = ""; 
    for($i=0; $i<$tam; $i++)
    {
        $onechar = $thisline[$i];
        if( $onechar > chr(240) )
        {
            continue;
        }

        if( $onechar >= chr(0x20) )
        {
            $caracteres++;
            $new_line .= $onechar;
        }

        if( $onechar == chr(0x14) )
        {
            $new_line .= "</a>";
        }

        if( $onechar == chr(0x07) )
        {
            $new_line .= "\t";
            if( isset($thisline[$i+1]) )
            {
                if( $thisline[$i+1] == chr(0x07) )
                {
                    $new_line .= "\n";
                }
            }
        }
    }
    //troca por hiperlink
    $new_line = str_replace("HYPERLINK" ,"<a href=",$new_line); 
    $new_line = str_replace("\o" ,">",$new_line); 
    $new_line .= "\n";

    //link de imagens
    $new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line); 
    $new_line = str_replace("\*" ,"><br>",$new_line); 
    $new_line = str_replace("MERGEFORMATINET" ,"",$new_line); 

    $new_line = @iconv('UTF-8', 'ISO-8859-9', $new_line);
    $new_line = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", ' ', $new_line);
    $new_line = mb_strtolower(trim($new_line, '-'),'UTF-8');
    $new_line = preg_replace("/[\/_|+ -]+/", " ", $new_line);
    $new_line = preg_replace("/[0]/", "i", $new_line);
    $new_line = preg_replace("/[1]/", "i", $new_line);
    $outtext .= $new_line;
} return $outtext;}

And here is 'echo $file' print. (Because of this print is very big, I post screen shot) enter link description here

score 0 · Answer 3 · answered Aug 06 '15 at 10:25

0

I solved the problem. "@preg_match_all("/\b(([a-z0-9]+))\b/", $file, $fileOnlyAlphabetic);" when I write this code I get correct result. Thanks for your answer. I am so happy :)

answered Aug 06 '15 at 10:25

Muhammed Yusuf Taşkesenligil

131
1
3
12

Regex [a-zA-Z0-9] is not work?

3 Answers3