0

I have a dataset in the form of a CSV file than is sent to me on a regular basis. I want to import this data into my MySql database and turn it into a proper set of tables. The problem I am having is that one of the fields the is used to store multiple values. For example the field is storing email addresses. It may one email address or it may have two, or three, or four, etc. The field contents would look something like this. "user1@domain.com,user2@domain.com,user3@domain.com".

I need to be able to take the undetermined number of values from each field and then add them into a separate table so that they look like this.

user1@domain.com  
user2@domain.com  
user3@domain.com

I am not sure how I can do this. Thank you for the help.

user1757006
  • 705
  • 2
  • 12
  • 23
  • 1
    Which programming language do you want to use? – Peter Oct 27 '14 at 21:21
  • 2
    possible duplicate of ["Reverse GROUP\_CONCAT" in MySQL?](http://stackoverflow.com/questions/17308669/reverse-group-concat-in-mysql) – jpw Oct 27 '14 at 21:22

3 Answers3

1

Probably the simplest way is a brute force approach of inserting the first email, then the second, and so on:

insert into newtable(email)
    select substring_index(substring_index(emails, ',', 1), ',', -1)
    from emails
    where (length(replace(emails, ',', ',,')) - length(emails)) >= 1;

insert into newtable(email)
    select substring_index(substring_index(emails, ',', 2), ',', -1)
    from emails
    where (length(replace(emails, ',', ',,')) - length(emails)) >= 2;

insert into newtable(email)
    select substring_index(substring_index(emails, ',', 3), ',', -1)
    from emails
    where (length(replace(emails, ',', ',,')) - length(emails)) >= 3;

And so on.

That is, extract the nth element from the list and insert that into the table. The where clause counts the number of commas in the list, which is a proxy for the length of the list.

You need to repeat this up to the maximum number of emails in the list.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Instead of importing the csv file directly and then trying to fix the problems in it, I found the best way to attack this was to first pass the csv to AWK.

AWK outputs three separate csv file that follow the normal forms. I then import those tables and all is well.

2 info="`ncftpget -V -c -u myuser -p mypassword ftp://fake.com/data_map.csv`"
3
4 echo "$info" | \
5 awk -F, -v OFS="," 'NR > 1 {
6   split($6, keyvalue, ";")
7     for (var in keyvalue) {
8       gsub(/.*:/, "", keyvalue[var])
9       print $1, keyvalue[var]
10 }}' > ~/sqlrw/table1.csv
11
12 echo "$info" | \
13 awk -F, -v OFS="," 'NR > 1 {
14   split($6, keyvalue, ";")
15     for (var in keyvalue) {
16       gsub(/:/, ",", keyvalue[var])
17       print keyvalue[var]
18 }}' > ~/sqlrw/table2.csv
19
20 sort -u ~/sqlrw/table2.csv -o ~/sqlrw/table2.csv
21
22 echo "$info" | \
23 awk -F, -v OFS="," 'NR > 1 {
24       print $1, $2, $3, $4, $5, $7, $8
25 }' > ~/sqlrw/table3.csv
user1757006
  • 705
  • 2
  • 12
  • 23
0

Maybe using a simple php script would/shoud do the trick

<?php
    $file = file_get_contents("my_file.csv");
    $tmp = explode(";", $file); // iirc lines in csv are terminated by a ;

    for ($i=0; $i<count($tmp); $i++)
    {
        $field = $tmp[$i];

        $q = "INSERT INTO my_table (emails) VALUES (`$field`)";
        // or use $i as an id if don't have an autoincrement
        $q = "INSERT INTO my_table (id, emails) VALUES ($i, `$field`)";

       // execute query ....

    }
?>

Hope this helps even if it's not pure SQL .....

Stv
  • 496
  • 6
  • 16