0

My PHP script is receiving data from an HTML input as an array:

<input type="checkbox" name="value[]" value="1" />  
<input type="checkbox" name="value[]" value="2" />  
<input type="checkbox" name="value[]" value="3" />
...

My current method of serialization is as such:

<?php
class DataClass {
private $separator = ",";
private function sanitize($input){
    $patterns = array(
        '/^'.$this->separator.'/',
        '/'.$this->separator.$this->separator.'/',
        '/'.$this->separator.'$/'
        );
    $replacements = array(
        '',
        $this->separator,
        ''
        );
    return preg_replace($patterns, $replacements, $input);
}
public function deserialize($input){
    return explode($this->separator, $this->sanitize($input));
}
private function serialize($input){
    $bucket = array();
    if(is_array($input)):
        foreach($input as $value){
            if(!empty($value)):
                array_push($bucket, $value);
            endif;
        }
        return $this->sanitize(implode($this->separator, $bucket));
    else:
        return "";
    endif;
}

public function store($formdata) {
    $formdata['value'] = empty($formdata['value']) ? '' : $this->serialize($formdata['value']);
    // save $formdata to database
}

}
?>

So, after reading this post my questions are these:

  1. What is the best way to optimize the efficiency of the existing code?
  2. Would there be any performance benefits to using a different means of converting the simple indexed array into something that can be stored in the database?
    1. How would using a BLOB column type compare with the varchar(255) that I am getting away with currently?
    2. Since it is not an associative array, would json_encode even be the right method, or is it just overkill?
Community
  • 1
  • 1
oomlaut
  • 763
  • 4
  • 12

2 Answers2

1

Generally speaking, the reason serialize() or json_encode() would be preferred is because of their companion functions, it's really easy to get the data back, exactly as you started, by using unserialize() and json_decode().

That said, with a simple array like what you're dealing with, a simple comma separated list is fine. There are, however, a few improvements you can make. But let me be clear: what you're trying to accomplish is so simple that processing efficiency is not a genuine concern. What is a concern is clarity and maintainability, and that's where we'll make our gains.

  1. If all of your checkboxes have non-empty values (e.g. none of them are set up like this: <input type="checkbox" name="value[]" value="" />), then you probably shouldn't need to remove empty values from your list. All modern browsers that I'm aware of do not send any data for checkboxes that aren't checked. That said...
  2. It's still good form to do so. But you're doing that twice over, both before you implode and after. You don't need sanitize() at all. But more than that...
  3. There are still more improvements we can make that both expand the functionality of your simple serialization and make it simpler to interact with. All while taking advantage of all the speed that PHP has to offer...

<?php
private function serialize($input) {
    // I've changed this, generally speaking you probably
    // don't actually want "serialize" to turn a non-array
    // variable into an empty string
    if (is_scalar($input)) return $input;

    // we still test for an array, because it could potentially
    // be something else...
    elseif (is_array($input)) {
        $bucket = array();
        foreach ($input as $value) {
            // empty() will match 0, so that's probably not
            // what we want... if we test for length, then
            // it'll match anything we might reasonably want
            // to record
            if (strlen($value)) $bucket[] = $value;
        }
        return implode($this->separator, $bucket);
    }

    // if we've got an object, we just serialize with our
    // chosen method: serialize() or json_encode()
    elseif (is_object($input)) return serialize($input);

    // otherwise, we have no reasonable way to serialize
    else return ''; // we might also return NULL
 }
 ?>

... we can make it even simpler if we know certain things about the data. If, for instance, we know all of the values are going to be unique, we can be very direct about what we're looking for:

<?php
elseif (is_array($input)) {
    $input = array_unique($input);
    if (in_array('', $input)) array_splice($input, array_search('', $input), 1);
    return implode($input);
}
?>
Jason
  • 13,606
  • 2
  • 29
  • 40
1

First off to solve the issue of missing data elements and normalize inputs I always use array_merge to set up defaults. For example.

$defaults = array("checkbox1"=>false, "checkbox2"=>0);
$data = array_merge($defaults, $_POST);

I would say switch over to JSON as a form of serialization and here are the reasons:

  1. JSON is fast to encode to or from, tests that I have performed have given json_encode a 35% speed advantage over serialize for simple array structures.

  2. JSON supports nested data structures unlike flat text files so it would be a little more future proof.

  3. JSON allows at least a named label which at least a clue of what is happening. And converts to an associative array ... helpful if you are dealing with dataset with a lot of keys... two months latter $a["profile"]["header_font"] makes a whole lot more sense then $a[12][4];

  4. JSON serialized data is smaller then the results of serialize. (but more then a comparable flat file)

  5. JSON is portable... only PHP can read the results of serialize ... damn near everything can read json. Allows for use of other languages or tools to be able to utilize the data.

As for saving to the database I have alway just used the TEXT mysql datatype. It allows for 2^16 characters and I have stored more then I should in these columns and not run into any problems.

That's my two cents.

Orangepill
  • 24,500
  • 3
  • 42
  • 63
  • For the general case I agree and I've started using `json_encode` myself. But if he's in a scenario where a simple one dimensional array (of numbers, if the example isn't dramatically simplified) is guaranteed, then using a comma delimited list would be a fine (and simple to access, portable and readable, and space efficient) way to handle it. – Jason May 24 '13 at 12:32