1

how do you parse a csv file into a mysql database with php, From bits I have read from php documents I am not sure how or if its possible to add conditions to the lines that you are reading into the database so that only values that meet the condition are passed into the database.

for instance I currently have a csv file that looks like this

0001D,5879
0001E,0521
0001F,4587
0001G,2220
0001H,2482
0001I,9087
0001J,2255
0001K,2247

the codes before the comma are supposed to only be hex values, what I would like to do is exclude any line (so hex value and its code) that dose not represent a hex code (0-9)-(A-F)... any idea on how the could be accomplished?

C.Johns
  • 10,185
  • 20
  • 102
  • 156
  • 1
    Why don't you just use something like sed to preprocess the file and then use MySQL's built-in functionality to load the resulting CSV? – dave Jun 27 '11 at 03:06
  • You might find some help getting started from the answers in this thread, [PHP not reading the entire CSV file](http://stackoverflow.com/questions/6368138/php-not-reading-the-entire-csv-file/6368176#6368176). –  Jun 27 '11 at 03:20
  • hrmm, I think its related to an experience thing.. I'm still new to this stuff so obviously have come to this conclusion purely on a lack of experience... :( do you have an example or explanation of how I could attempt to achieve my goal with your process? – C.Johns Jun 27 '11 at 03:22
  • @C.Johns may I ask which version of PHP you're using? The 5.2.x branch or the 5.3.x branch? –  Jun 27 '11 at 03:31
  • I am using MAMP 10.6 which runs PHP 5.2.13 & 5.3.2.. according to the website http://www.mamp.info/en/mamp/index.html – C.Johns Jun 27 '11 at 03:40
  • ahh php 5.3.5 sorry was a pain to find where it was set. – C.Johns Jun 27 '11 at 03:52
  • @C.Johns it's okay, I wrote you code that will work in both anyway. –  Jun 27 '11 at 04:11

2 Answers2

0

This code uses the fgetcsv function to parse out one line at a time from a csv file, putting each delimited field into an array, and returning that array. With your example csv, the array returned will only contain two fields.

Next, after checking to make sure the return value from the fgetcsv isn't false or null, the code checks to see if the first field of the array contains any non-hex characters in the range [G-Z] using preg_match. If it doesn't, then it will put the two fields of the array (representing a line from you csv file) into a new array, which you can operate on later.

<?php
$fileName = "test.csv";
$hexLines = array();

if (($file = fopen($fileName, "r")) !== false)
{
    while (($line = fgetcsv($file, 0, ",")) !== false)
    {
        if (!empty($line))
        {
            // IF first field only has hex chars
            if (preg_match('/^[0-9A-F]+$/', $line[0]) === 1) 
            {
                $hexLines[] = array($line[0], $line[1]);
            }
        }
    }
    fclose($file);
}
print_r($hexLines);

By the way, the above code will work in both PHP versions 5.2.x and 5.3.x. If you were 100% sure that you were going to be using 5.3, I would've written code using SplFileObject instead, since it's object-oriented, and maybe a little easier to use than the above functions.

  • okay cool thanks man, give me abit of time before I accept your answer as I want to run through and understand exactly whats going on. Thanks for your reply. – C.Johns Jun 27 '11 at 04:18
  • I take issue with this line: `if (preg_match('/[G-Z]+/', $line[0]) != 1)`. You should be more aggressive with your checks and whitelist instead of blacklist. You're better with a test for 0-9A-F. – cwallenpoole Jun 27 '11 at 04:29
  • @C.Johns you might want to check out cwallen's answer too, he might be right. –  Jun 27 '11 at 04:30
  • @cwallen you have to forgive me, I'm a little slow, I don't understand right away why what you suggest is better, is that something you can explain to me? –  Jun 27 '11 at 04:38
  • @cwallen never mind, I can see now how what you suggest is more safe, I'll update my answer to reflect that. –  Jun 27 '11 at 04:45
  • btw This worked perfectly, all I have to do now is pass it over to either the database or to a text file then pass the text file to the database :) – C.Johns Jun 27 '11 at 23:21
0

You're far better off doing this without PHP if possible. Use mysqlimport to import the file, and then use:

-- assuming you use "ID" as your first column like the example 
-- on the mysqlimport page
DELETE FROM <tablename> WHERE CONV(CONV(ID, 16, 10), 10, 16) != 
                              TRIM(LEADING '0' FROM ID);

So, what's going on?

  1. First you've imported everything into a table.
  2. Then, you've looked through all of the columns which are supposed to be hex. If they cannot be converted to Hex and back (something which is only true of valid Hex numbers, everything else is truncated) and match the trimmed version of themselves (leading 0's need to be accounted for), they are not valid.

If you must use PHP, I would personally simply continually use MySQL's INSERT and I'd still call the above DELETE. I'll bet money that it's faster.

cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
  • okay cool, thanks for your suggestion, this was my original train of thought also however being open to suggestion I had my mind changed but with all the hassle I am going through trying to figure this out its starting to become more apparent that the way you have described to do things feels a little more nature and to a degree logical. – C.Johns Jun 27 '11 at 04:31
  • Just a quick update, this did't work, it deleted alot of numbers that wernt hex but also ones that were hex... however I do think there is a way to achieve the required outcome through this convention... just need to conform more to the type of codes I have in my database...... they range from 00001-10C1R, its a massive database which makes use of the whole alphabet where I only want to include numbers that conform to hex 0-9 a-f... I will keep looking. – C.Johns Jun 27 '11 at 22:57
  • well it kinda works. but drops a ton of hex codes off the list.. need to review this in-depth. I think the TRIM(LEADING '0'... is doing something I need to fix. almost there however. – C.Johns Jun 27 '11 at 23:08
  • okay so they both do the same thing... I guess its personal preference however this is easier as it sorts everything out in the database there is no fluffing around conforming everything in php and passing it over. again thanks very much. – C.Johns Jun 27 '11 at 23:28