191

I have a file that can contain from 3 to 4 columns of numerical values which are separated by comma. Empty fields are defined with the exception when they are at the end of the row:

1,2,3,4,5
1,2,3,,5
1,2,3

The following table was created in MySQL:

+-------+--------+------+-----+---------+-------+
| Field | Type   | Null | Key | Default | Extra |
+-------+--------+------+-----+---------+-------+
| one   | int(1) | YES  |     | NULL    |       | 
| two   | int(1) | YES  |     | NULL    |       | 
| three | int(1) | YES  |     | NULL    |       | 
| four  | int(1) | YES  |     | NULL    |       | 
| five  | int(1) | YES  |     | NULL    |       | 
+-------+--------+------+-----+---------+-------+

I am trying to load the data using MySQL LOAD command:

LOAD DATA INFILE '/tmp/testdata.txt' INTO TABLE moo FIELDS 
TERMINATED BY "," LINES TERMINATED BY "\n";

The resulting table:

+------+------+-------+------+------+
| one  | two  | three | four | five |
+------+------+-------+------+------+
|    1 |    2 |     3 |    4 |    5 | 
|    1 |    2 |     3 |    0 |    5 | 
|    1 |    2 |     3 | NULL | NULL | 
+------+------+-------+------+------+

The problem lies with the fact that when a field is empty in the raw data and is not defined, MySQL for some reason does not use the columns default value (which is NULL) and uses zero. NULL is used correctly when the field is missing alltogether.

Unfortunately, I have to be able to distinguish between NULL and 0 at this stage so any help would be appreciated.

Thanks S.

edit

The output of SHOW WARNINGS:

+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: '' for column 'four' at row 2 | 
| Warning | 1261 | Row 3 doesn't contain data for all columns             | 
| Warning | 1261 | Row 3 doesn't contain data for all columns             | 
+---------+------+--------------------------------------------------------+
Spiros
  • 2,231
  • 3
  • 15
  • 26
  • With data schema changes like that I would use [d6tstack](https://github.com/d6t/d6tstack) which aligns all columns before running `LOAD DATA`. See [d6tstack SQL examples](https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb) section on data schema changes. – citynorman Oct 14 '18 at 22:57

9 Answers9

231

This will do what you want. It reads the fourth field into a local variable, and then sets the actual field value to NULL, if the local variable ends up containing an empty string:

LOAD DATA INFILE '/tmp/testdata.txt'
INTO TABLE moo
FIELDS TERMINATED BY ","
LINES TERMINATED BY "\n"
(one, two, three, @vfour, five)
SET four = NULLIF(@vfour,'')
;

If they're all possibly empty, then you'd read them all into variables and have multiple SET statements, like this:

LOAD DATA INFILE '/tmp/testdata.txt'
INTO TABLE moo
FIELDS TERMINATED BY ","
LINES TERMINATED BY "\n"
(@vone, @vtwo, @vthree, @vfour, @vfive)
SET
one = NULLIF(@vone,''),
two = NULLIF(@vtwo,''),
three = NULLIF(@vthree,''),
four = NULLIF(@vfour,'')
;
Jacob
  • 2,212
  • 1
  • 12
  • 18
Duncan Lock
  • 12,351
  • 5
  • 40
  • 47
  • Theoretically, I suppose - but it's all in-memory, and only holding tiny amounts of data per row, so I would image it would be infinitesimal; but you should test it if you think it might be a problem. – Duncan Lock Apr 19 '13 at 03:00
  • 4
    I really like this answer. Users can see empty strings `''` when they download a csv (using `IFNULL(Col,'')` in `SELECT INTO OUTFILE` query) for excel but then uploads accept them as null vs having to deal with `\N` in the csv. Thanks! – chrisan Sep 29 '13 at 15:47
  • 11
    for dates I used 'NULLIF(STR_TO_DATE(@date1, "%d/%m/%Y"), "0000-00-00")' – Joaquín L. Robles Feb 23 '14 at 23:20
  • 1
    I have a csv file that contains zeros `0` that should be converted to `NULL` (because it is not possible to have zero value for the data in question) and also empty strings. How to make sure that both zeros and empty strings are converted to `NULL`? – Paul Rougieux Sep 11 '17 at 15:42
  • If the zero values and empty strings are in separate columns, then just do the above for the empty strings, and something like this for the zeros: `nullif(@vone, 0)`. – Duncan Lock Sep 12 '17 at 00:01
  • If they're both in the same column - i.e. a single source column that can contain either zero or empty strings, then you'll probably need to nest the `nullif` calls. – Duncan Lock Sep 12 '17 at 00:04
  • @Blacksonic I am performing 10 `nullif()` operations resulting in an 8.7% increased import time. I was importing 300k records in 10 seconds. Import time increased to around 11 seconds with the added `nullif()` conditions – Kenneth Mar 13 '18 at 02:33
  • 2
    How to do this without mentioning specific columns? just for all? – user8411456 Feb 13 '20 at 19:28
  • 1
    what if you have 50 columns? setting up 50 columns seems overkill. is there a global way of telling Load function to just replace empty values with null? – ahsant Oct 30 '20 at 01:28
  • Lemme guess does using `NULLIF` disqualify me from using `IGNORE 1 LINES`? – Lori Apr 29 '22 at 03:23
164

MySQL manual says:

When reading data with LOAD DATA INFILE, empty or missing columns are updated with ''. If you want a NULL value in a column, you should use \N in the data file. The literal word “NULL” may also be used under some circumstances.

So you need to replace the blanks with \N like this:

1,2,3,4,5
1,2,3,\N,5
1,2,3
Janci
  • 3,198
  • 1
  • 21
  • 22
  • 3
    Thanks for the tip - I am sceptical to edit the raw source data but if this is the only way around it I will try it out. – Spiros Apr 20 '10 at 13:55
  • 9
    I understand your scepticism, no one likes editing raw data, it just doesn't feel right. However, if you think about it for a minute, there has to be a way to distinguish between NULL and empty string. Should blank entries be translated to NULLs, you'd need a special sequence for empty string. It would nice to have a way how to tell MySQL how to treat blank entries though, something like LOAD DATA INFILE '/tmp/testdata.txt' INTO TABLE moo TREAT BLANKS AS NULL... – Janci Apr 20 '10 at 14:17
  • 3
    OK, but if you have `Fields enclosed by: "` is that `"\N"` of `"name",\N,"stuff"` – Jonathon Aug 25 '13 at 01:42
  • 4
    I can verify that at least for "phpMyAdmin 3.5.5" no style of `\N` is accepted as denoting `NULL`. Instead use `NULL`, as in this example: `"name","age",NULL,"other","stuff"` – Jonathon Aug 25 '13 at 02:05
  • 2
    We have MySQL 5.5.46-0+deb8u1. I tried both NULL and \N, and only \N worked for us. – raphael75 Jun 30 '16 at 12:02
  • Is there some CSV standard that recognizes `\N` as a database NULL value? Or is `\N` some MySQL thing? Something feels corrupt about altering a `.csv` file into a format that contains non-CSV syntax, even if I'm gonna throw it away afterward. The solution mapping ad-hoc variables to columns seems extremely inelegant, but I think I want to take inelegant over corrupt. – Lori Apr 29 '22 at 03:06
8

The behaviour is different depending upon the database configuration. In the strict mode this would throw an error else a warning. Following query may be used for identifying the database configuration.

mysql> show variables like 'sql_mode';
Dobi
  • 101
  • 1
  • 5
  • Thanks! I was scratching my head trying to work out why importing a CSV with empty columns I'd successfully imported on the production server yesterday wasn't working on my brand-new local installation - this was the answer in my case! – Emma Burrows Jun 13 '16 at 15:54
5

Preprocess your input CSV to replace blank entries with \N.

Attempt at a regex: s/,,/,\n,/g and s/,$/,\N/g

Good luck.

Sam Goldman
  • 1,446
  • 13
  • 13
  • 1
    This regex partially works, it doesn't solve sequential blank entries, for example ,,,, will be ,\n,,\n, Should be usable if you run it twice – ievgen Jun 22 '16 at 19:12
  • 2
    Will summarize the answer and previous comment. Following worked for me, in the order: sed -i 's/,,/,\N/g' $file, sed -i 's/,,/,/g' $file, sed -i 's/\N,$/\N/g' $file, – Omar Khazamov Dec 03 '16 at 23:43
  • I would like to do this, but I am not clear on how you are running this regex. If you are using MySQL to run this against the file this would be the best solution. But you don't say and I don't want to spend a bunch of time googling how to do something that may not be possible. – DonkeyKong Jul 19 '19 at 03:57
4

show variables

Show variables like "`secure_file_priv`";

Note: keep your csv file in location given by the above command.

create table assessments (course_code varchar(5),batch_code varchar(7),id_assessment int, assessment_type varchar(10), date int , weight int);

Note: here the 'date' column has some blank values in the csv file.

LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/assessments.csv' 
INTO TABLE assessments
FIELDS TERMINATED BY ',' 
OPTIONALLY ENCLOSED BY '' 
LINES TERMINATED BY '\n' 
IGNORE 1 ROWS 
(course_code,batch_code,id_assessment,assessment_type,@date,weight)
SET date = IF(@date = '', NULL, @date);
1

(variable1, @variable2, ..) SET variable2 = nullif(@variable2, '' or ' ') >> you can put any condition

Said
  • 19
  • 2
1

Converted the input file to include \N for the blank column data using the below sed command in UNix terminal:

sed -i 's/,,/,\\N,/g' $file_name

and then use LOAD DATA INFILE command to load to mysql

Aimnox
  • 895
  • 7
  • 20
0

You can firstly read the file in pandas as pandas dataframe and then wherever you want the values to be NULL , there you can replace the empty values with string 'NULL' using replace function (dataframe_name.replace(value_to_be_replaced,'NULL') and save the new dataframe in .csv foramt using to_csv function.

After this when you will import the csv file into MySQL using :

  1. Connect to the server with local-infile system variable :

mysql --local-infile=1 -u root -p

  1. When you are inside MySQL server on cmd, set the global variables by using this command:

SET GLOBAL local_infile=1;

  1. Use your Database and load the file into the table:

use

  1. Load the file

load data local infile '<path_to_file>' into table <table_name> columns terminated by "," optionally enclosed by "'" ignore 1 lines.

Then all the NULL values in the dataset will be recognised as NULL only.

I hope it helps.

Md. Nashir Uddin
  • 730
  • 7
  • 20
-1

MySQL converts empty fields into empty string '', hence why the error when inserting numerical fields, since the conversion from string to INT is not a thing. Even when the INT field in the create table is DEFAULT NULL. The straightforward solution would be to preprocess the csv and insert \N (not \n) as NULL fields. This can be done quickly with:

sed -i 's/,,/,\\N,/g file.csv'
sed -i 's/,,/,\\N,/g file.csv'

It is important to do it twice because consecutive blank fields will be skipped, since the second separator of a blank field is also the first separator of the next field, and it will be skipped after the first substitution.

In other words, if you use only one command, something,,,,SomethingElse will be converted to something,\N,,\N,SomethingElse.

Maybe there is a smarter way to do it with a more advanced command but this works just fine. You can loop through all csvs in a dir and run the command twice for each file. (reference)