3

I'm coding a basic maillist system for our website. The "subscribe.php" page uses the $_GET method for parameters. I add the email addresses in a text file (maillist.txt).

Before adding the address, I check that it's not in the file yet.

Problem: comparing two identical strings returns false..

What I've tried:

  • I made sure that maillist.txt is in UTF-8
  • I tried setting header in UTF-8
  • I tried using strcmp()
  • I tried converting both strings with utf8_encode

Here is the "subscribe.php" code: (I erased all regex and isset checks)

<?php
    // UTF-8 ----> things I've added, trying to solve the problem
    header('Content-Type: text/html; charset=utf-8');
    ini_set('default_charset', 'utf-8');
    ini_set("auto_detect_line_endings", true);

    $email = strip_tags($_GET['email']); // For safety

    $maillist = fopen('maillist.txt', 'r+');

    // Check if email is already in the database
    $insert = true;
    while ($line = fgets($maillist)) {
        $line = rtrim($line, "\r\n");
        if (utf8_encode($line) == utf8_encode($email)) { // $line == $email returns false
            echo $line . "=" . $email . "<br/>";
            $insert = false;
            break;
        } else echo $line . "!=" . $email . "<br/>";
    }

    if ($insert) {
        fputs($maillist, $email . "\n");
        echo 'Success';
    } else echo "Fail";

    fclose($maillist);
?>
Kevin Ji
  • 10,479
  • 4
  • 40
  • 63
user2154283
  • 291
  • 1
  • 5
  • 14
  • 4
    You might want to close the double quote `else echo "Fail;` <--- there – Sterling Archer Aug 15 '13 at 23:48
  • 1
    What it shows to you when you do `echo $line . "!=" . $email . "
    ";`?
    – Prix Aug 15 '13 at 23:49
  • 2
    +1 for **clearly** saying what you have tried. – Jordan Aug 15 '13 at 23:50
  • @Prix : it shows "email_1@email.com!=email_1@email.com".. PRPGFerret : I edited. That mistake comes from the edit I made to post the code here. – user2154283 Aug 15 '13 at 23:53
  • 1
    Without having access to the resources themselves i can't really give a good guess. But for the sake of debugging, have you tried `filter_var($line, FILTER_SANITIZE_EMAIL);` to filter out any non-email valid chars. If that doesn't work, try making a hash of the line and hash of the email, and compare the hashes on an email/line combination that SHOULD match but isn't (might need to add an incrementing line counter, so you can specify a line of the file at which it should make a hash and output them) – Lee Aug 15 '13 at 23:53
  • I don't think you should concern yourself with the encoding of the file, the emails should only have letters, numbers and underscore otherwise they are invalid so you should do as Lee suggested of filter it yourself, also use trim instead of rtrim, you don't want space on any sites. [**See this for more information.**](http://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-email-address) – Prix Aug 15 '13 at 23:55
  • Provide us with some emails from the list to test :p – Hanlet Escaño Aug 15 '13 at 23:56
  • Does `$line === utf8_encode($email)` work? – Kevin Ji Aug 15 '13 at 23:57
  • Run `xxd maillist.txt` and see what exactly is in the line you're looking at. Does it contain anything apart from the expected characters between the `0a` bytes? (newlines) – viraptor Aug 15 '13 at 23:59
  • Also remove the characters from your rtrim function, you could be leaving other hidden characters in there, rtrim will already trim the two characters you specified, so let it do them plus the ones you dont have listed. I tend to also always use trim as Prix sugggested, whilst in this case there shouldnt logically be any characters at the start of the string, i'd treat it as "is there a specific reason i dont want to use trim", if not, then use it, irrelevant of whether there should be any characters at the start... just makes sure – Lee Aug 16 '13 at 00:00
  • After doing some looking, it seems using `===` is preferable when comparing strings, so I would use that. I am not sure it is causing your bug here, but it can cause problems down the road. Who knows, it might even be the issue you are asking about :) – Jordan Aug 16 '13 at 00:01
  • If maillist.txt is already in UTF-8 you should NOT call utf8_encode() on it. – Langdi Aug 16 '13 at 00:01
  • Are the strings the same *length*? That's a pretty easy test to check there's no silly characters hiding in there. – paddy Aug 16 '13 at 00:01
  • @Lee : tried filtering like this : filter_var($line, FILTER_SANITIZE_EMAIL); filter_var($email, FILTER_SANITIZE_EMAIL); if ($line == $email) { ... But didn't work. Tried with trim instead of rtrim, didn't work. – user2154283 Aug 16 '13 at 00:04
  • Assuming you did `$email=filter_var($email, FILTER_SANITIZE_EMAIL);`? you need to capture the return value of that function and compare that, filter_var returns the filtered string. – Lee Aug 16 '13 at 00:07
  • Ok, so now it's interesting, the file contains two times the email "email_1@email.com" and one of them is equal, the other one isn't. EDIT : captured the return value of filter_var, and now both emails are detected equals. – user2154283 Aug 16 '13 at 00:09
  • @Lee : why wasn't it working with that filtering ? – user2154283 Aug 16 '13 at 00:17
  • Finding the difference between two strings is easier when you use `bin2hex()` on both strings and inspect. – Ja͢ck Aug 16 '13 at 05:51
  • @user2154283 : if filter_var fixed it, it probably wasn't working because of some hidden characters. hidden characters can be a menace when dealing with files as the O/S and/or software can add stuff in. Null Byte characters e.c.t. and could also add in characters in the middle of a string, which would then not be trimmed by rtrim. However it is indeed a rare and confusing situation. – Lee Aug 16 '13 at 11:33

3 Answers3

0

taking a shot in the dark here...

  1. first, store your values as variables and reuse those, you may be printing different stuff than you're comparing.

  2. trim those variables to make sure there aren't any extraneous whitespaces before or after.

    while ($line = fgets($maillist)) {
        $line = rtrim($line, "\r\n");
    
        //the two variables you want to compare
        $lineValue = trim(utf8_encode($line));
        $email     = trim(utf8_encode($email));
    
        //compare them them out
        if ($lineValue == $email) {
            echo $lineValue . "==" . $email . "<br/>"; //compare the trimmed variables
            $insert = false;
            break;
        } else {
            echo $lineValue . "!=" . $email . "<br/>";
        }
    }
    

that may not even be your problem, but its a good place to start if you're seeing the same string with your eyes..

Kristian
  • 21,204
  • 19
  • 101
  • 176
0
  1. You need to store your values as variables.

  2. use trim those variables to make sure any extra white spaces before or after.

    while ($line = fgets($maillist)) {
        $line = rtrim($line, "\r\n");
    
        //the two variables you want to compare
        $lineValue = trim(utf8_encode($line));
        $email     = trim(utf8_encode($email));
    
        //compare them them out
        // "===" means "Identical" True if x is equal to y, and they are of same type
        if ($lineValue === $email) {
            echo $lineValue . "==" . $email . "<br/>"; //compare the trimmed variables
            $insert = false;
            break;
        } else {
            echo $lineValue . "!=" . $email . "<br/>";
        }
    }
    
Ajeesh
  • 277
  • 2
  • 8
0

To summarize everything that was said :

The problem was basically that I didn't filter special email characters, so I fixed it by filtering the variables with filter_var($line, FILTER_SANITIZE_EMAIL);

user2154283
  • 291
  • 1
  • 5
  • 14