2

i have these functions pakage http://nadeausoftware.com/articles/2008/05/php_tip_how_parse_and_build_urls

put if we do this

$x =  url_to_absolute('http://al-mashhad.com/News/النيابة-تستمع-لأقوال-خالد-يوسف-في-بلاغه-ضد-أبو-إسم/141274.aspx','../../Media/News/2012/12/16/2012-634912584761067771-106.jpg');

var_dump($x);

it will return false because these functions does not support arabic

specifically this function

function split_url( $url, $decode=TRUE )
{
    // Character sets from RFC3986.
    $xunressub     = 'a-zA-Z\d\-._~\!$&\'()*+,;=';
    $xpchar        = $xunressub . ':@%';

    // Scheme from RFC3986.
    $xscheme        = '([a-zA-Z][a-zA-Z\d+-.]*)';

    // User info (user + password) from RFC3986.
    $xuserinfo     = '((['  . $xunressub . '%]*)' .
                     '(:([' . $xunressub . ':%]*))?)';

    // IPv4 from RFC3986 (without digit constraints).
    $xipv4         = '(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})';

    // IPv6 from RFC2732 (without digit and grouping constraints).
    $xipv6         = '(\[([a-fA-F\d.:]+)\])';

    // Host name from RFC1035.  Technically, must start with a letter.
    // Relax that restriction to better parse URL structure, then
    // leave host name validation to application.
    $xhost_name    = '([a-zA-Z\d-.%]+)';

    // Authority from RFC3986.  Skip IP future.
    $xhost         = '(' . $xhost_name . '|' . $xipv4 . '|' . $xipv6 . ')';
    $xport         = '(\d*)';
    $xauthority    = '((' . $xuserinfo . '@)?' . $xhost .
                 '?(:' . $xport . ')?)';

    // Path from RFC3986.  Blend absolute & relative for efficiency.
    $xslash_seg    = '(/[' . $xpchar . ']*)';
    $xpath_authabs = '((//' . $xauthority . ')((/[' . $xpchar . ']*)*))';
    $xpath_rel     = '([' . $xpchar . ']+' . $xslash_seg . '*)';
    $xpath_abs     = '(/(' . $xpath_rel . ')?)';
    $xapath        = '(' . $xpath_authabs . '|' . $xpath_abs .
             '|' . $xpath_rel . ')';

    // Query and fragment from RFC3986.
    $xqueryfrag    = '([' . $xpchar . '/?' . ']*)';

    // URL.
    $xurl          = '^(' . $xscheme . ':)?' .  $xapath . '?' .
                     '(\?' . $xqueryfrag . ')?(#' . $xqueryfrag . ')?$';

    // Split the URL into components.
    if ( !preg_match( '!' . $xurl . '!', $url, $m ) )
        return FALSE;

    if ( !empty($m[2]) )        $parts['scheme']  = strtolower($m[2]);

    if ( !empty($m[7]) ) {
        if ( isset( $m[9] ) )   $parts['user']    = $m[9];
        else            $parts['user']    = '';
    }
    if ( !empty($m[10]) )       $parts['pass']    = $m[11];

    if ( !empty($m[13]) )       $h=$parts['host'] = $m[13];
    else if ( !empty($m[14]) )  $parts['host']    = $m[14];
    else if ( !empty($m[16]) )  $parts['host']    = $m[16];
    else if ( !empty( $m[5] ) ) $parts['host']    = '';
    if ( !empty($m[17]) )       $parts['port']    = $m[18];

    if ( !empty($m[19]) )       $parts['path']    = $m[19];
    else if ( !empty($m[21]) )  $parts['path']    = $m[21];
    else if ( !empty($m[25]) )  $parts['path']    = $m[25];

    if ( !empty($m[27]) )       $parts['query']   = $m[28];
    if ( !empty($m[29]) )       $parts['fragment']= $m[30];

    if ( !$decode )
        return $parts;
    if ( !empty($parts['user']) )
        $parts['user']     = rawurldecode( $parts['user'] );
    if ( !empty($parts['pass']) )
        $parts['pass']     = rawurldecode( $parts['pass'] );
    if ( !empty($parts['path']) )
        $parts['path']     = rawurldecode( $parts['path'] );
    if ( isset($h) )
        $parts['host']     = rawurldecode( $parts['host'] );
    if ( !empty($parts['query']) )
        $parts['query']    = rawurldecode( $parts['query'] );
    if ( !empty($parts['fragment']) )
        $parts['fragment'] = rawurldecode( $parts['fragment'] );
    return $parts;
}

the question is how can i add regex to make it support arabic in url

1 Answers1

0

The URL you show is not really a valid URL. Only ASCII characters are allowed in an URL; anything else, you need to percent encode. Browsers display the correct characters anyway as a convenience.

Run urlencode() on the URL first, which will turn the Arabic characters into %xx entities; then run your function on it.

A modern browser will show the Arabic characters automatically even if you do this.

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • it will not work, i think we have to encode the arabic only, because if we encode by urlencode the whole url, the function will not work – علاء محمد Dec 17 '12 at 10:46
  • @علاء محمد oh, indeed. You're right. Hmmmmm, I can't think of a quick way to do that - one would have to split the whole thing manually and encode only the special (non-ASCII) characters. – Pekka Dec 17 '12 at 10:48
  • do you know any function to turn the Arabic characters into %xx entities; – علاء محمد Dec 17 '12 at 11:09
  • @علاء محمد no, not offhand I'm afraid. One would have to split the path into its components and then run a `urlencode()` on every one. Try searching though, I'm sure someone else has had this problem before – Pekka Dec 17 '12 at 11:16
  • do you know the termenlogoy of %xx entities; to search – علاء محمد Dec 17 '12 at 11:19
  • @علاء محمد percent encoding or URL encoding. – Pekka Dec 17 '12 at 11:25