2

FIY:

  • I'm working with a CVS file from Census - FactFinder
  • Using MySQL 5.7
  • OS is Windows 10 PRO

So, I created this table:

+----------+------------+------+-----+---------+-------+
| Field    | Type       | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------+
| SERIALNO | bigint(13) | NO   | PRI | NULL    |       |
| DIVISION | int(9)     | YES  |     | NULL    |       |
| PUMA     | int(4)     | YES  |     | NULL    |       |
| REGION   | int(1)     | YES  |     | NULL    |       |
| ST       | int(1)     | YES  |     | NULL    |       |
| ADJHSG   | int(7)     | YES  |     | NULL    |       |
| ADJINC   | int(7)     | YES  |     | NULL    |       |
| FINCP    | int(6)     | YES  |     | NULL    |       |
| HINCP    | int(6)     | YES  |     | NULL    |       |
| R60      | int(1)     | YES  |     | NULL    |       |
| R65      | int(1)     | YES  |     | NULL    |       |
+----------+------------+------+-----+---------+-------+

And tried to load data using:

LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'

It didn`t work as this message appear:

ERROR 1366 (HY000): Incorrect integer value: '' for column 'FINCP' at row 2

The row the error message is referring to is:

2012000000051,3,104,2,17,1045360,1056030,,8200,1,1

I believed FINCP which is the blank value ,, right before 8200 is the problem. So I followed this thread instructions: MySQL load NULL values from CSV data

And updated my code to:

LOAD DATA INFILE "C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/Housing_Illinois.csv"
INTO TABLE housing
CHARACTER SET latin1
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\n'
(@SERIALNO, @DIVISION, @PUMA, @REGION, @ST, @ADJHSG, @ADJINC, @FINCP, @HINCP, @R60, @R65)
SET
SERIALNO = nullif(@SERIALNO,''),
DIVISION = nullif(@DIVISION,''),
PUMA = nullif(@PUMA,''),
REGION = nullif(@REGION,''),
ST = nullif(@ST,''),
ADJHSG = nullif(@ADJHSG,''),
ADJINC = nullif(@ADJINC,''),
FINCP = nullif(@FINCP,''),
HINCP = nullif(@HINCP,''),
R60 = nullif(@R60,''),
R65 = nullif(@R65,'');

The first error is now gone but this message appears:

' for column 'R65' at row 12t integer value: '

The row at which this message is referring to is:

2012000000318,3,1602,2,17,1045360,1056030,,,,

There's no error message so I don't know what exactly is the problem. I can only assume that the problem is that there are four consecutive blank values.

Another tip, if I use CSV and change all blank to 0 the code goes smoothly, but I`m not a fan or editing raw data so I would like to know other options.

Bottom line, I have two questions:

  1. Shouldn`t data be loaded with the first code as MySQL should take ,, as null and 0 a plain 0?

  2. What's the problem I'm getting now that I'm using SERIALNO = nullif(@SERIALNO,'')

I want to be able to differentiate between 0 and null/blank values.

Thank you.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
René Martínez
  • 179
  • 1
  • 3
  • 11

1 Answers1

1

MySQL's LOAD DATA tool interprets \N as being a NULL value. So, if your troubled row looked like this:

2012000000318,3,1602,2,17,1045360,1056030,\N,\N,\N,\N

then you might not have this problem. If you have access to a regex replacement tool, you may try searching for the following pattern:

(?<=^)(?=,)|(?<=,)(?=,)|(?<=,)(?=$)

Then, replace with \N. This should fill in all the empty slots with \N, which semantically will be interpreted by MySQL as meaning NULL. Note that if you were to write a table out from MySQL, then nulls would be replaced with \N. The issue is that your data source and MySQL don't know about each other.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360