0

Recently I needed to use a set of real data from my previous project for demonstration purpose. While most data are real, personal information like name and email needed to be replaced by random generated data. I already got around few thousands set of name and email generated by using service like mockaroo.

Is there a way to insert these data quick using SQL? As I need to keep other data, I can only update the 2 fields instead of inserting new rows. I have an ID field as primary key, but it is not consecutive due to deleting rows in actual use. It would be best (but not a must) if it is in this format:

Some SQL UPDATE statement
("Russell Dixon", "jburkel@lulu.com"),
("Todd Gonzales", "sbishopq@jimdo.com"),
("Roger Green", "nhicksp@dyndns.org"),
.........
("Jason Anderson", "bgrayo@storify.com"),
("Gloria Larson", "rgonzalesn@upenn.edu"),
("Eric Bishop", "jcastillom@cnet.com")
Some SQL statment or empty
cytsunny
  • 4,838
  • 15
  • 62
  • 129
  • What RDBMS do you use? – Aleksey Ratnikov Aug 31 '16 at 08:10
  • Can't you just use multiple UPDATE statements? – Ron Deijkers Aug 31 '16 at 08:11
  • Possible duplicate of [Inserting multiple rows in mysql](http://stackoverflow.com/questions/6889065/inserting-multiple-rows-in-mysql) – Elzo Valugi Aug 31 '16 at 08:12
  • @RonDeijkers How? i.e. what should be the where statement? If I don't use a where statement, all rows will be equal to the last update. I have a id field as primary key but the number is not consecutive due to deleting data during actual use. – cytsunny Aug 31 '16 at 08:18
  • @Vasan As mentioned in the question, they are fake data generated by mockaroo. It looks real but they are 100% fake. – cytsunny Aug 31 '16 at 08:18
  • Check also https://stackoverflow.com/questions/3432/multiple-updates-in-mysql – Elzo Valugi Aug 31 '16 at 08:19
  • @ElzoValugi Good suggestion, but the problem is that while I have an ID field as primary key, it is not consecutive due to deleting row in actual use. Is there a way to update the field without specifying the ID? – cytsunny Aug 31 '16 at 08:21
  • You need to identify the row that you want to update somehow, otherwise each update will overwrite multiple or all entries. So either by id, either by another unique key, or by some column value like @Iain suggested in his/her answer – Elzo Valugi Aug 31 '16 at 08:23
  • @ElzoValugi Um... my ID is unique, it is just painful to add it by hand when it is not consecutive.... Is there a way to at least auto skip if ID not exist? – cytsunny Aug 31 '16 at 08:31
  • Insert your fake data into a temp table. Then write an UPDATE ... JOIN statement. – Paul Spiegel Aug 31 '16 at 08:36
  • Instead of looking for a SQL command to do this, how about writing a small script in your programming language of choice, where you connect to the db, get a list of existing and valid IDs and then update each row info with the new data? Or if you really want it into SQL, write a stored procedure doing the same logic as for the program. – Elzo Valugi Aug 31 '16 at 08:36
  • @ElzoValugi That's exactly how I have already solved the problem, but I am just asking here to see if there are easier way to solve the problem, as I think I will do similar things again some day. – cytsunny Sep 01 '16 at 07:41
  • SQL is it used and known for its relational properties, not for how easy is to do things with it :) – Elzo Valugi Sep 01 '16 at 07:50
  • @ElzoValugi True, but in my case, I need to setup running environment if I want to run other language script while I can just use SQL without any setup. It is the setup time that makes using SQL easier for me, and I think this case happens in many other developer and thus is worth asking. – cytsunny Sep 01 '16 at 07:55

3 Answers3

0

Try this:

UPDATE table
   SET email = CASE  
                      WHEN ID = 1 THEN "jburkel@lulu.com" 
                      etc....
              END

Edit: I just reread your question and think what you need is just a mass insert:

INSERT INTO table ( Column1, Column2 ) VALUES
( Value1, Value2 ), ( Value1, Value2 )
Iain
  • 387
  • 2
  • 12
  • In the question, "Russell Dixon" is the new name to be updated. It does not exist in the original data. – cytsunny Aug 31 '16 at 08:19
0

Possible way to do it:-

INSERT INTO some_table(ID, name, email)
SELECT sub1.ID, sub3.aName, sub3.aEmail
FROM
(
    SELECT ID, @seq1:=@seq1 + 1 AS seq
    FROM some_table
    CROSS JOIN (SELECT @seq1 := 0) sub0
) sub1
INNER JOIN
(
    SELECT aName, aEmail, @seq2:=@seq2 + 1 AS seq
    FROM
    (
        SELECT "Russell Dixon" AS aName, "jburkel@lulu.com" AS aEmail
        UNION SELECT "Todd Gonzales", "sbishopq@jimdo.com"
        UNION SELECT "Roger Green", "nhicksp@dyndns.org"
        UNION SELECT "Jason Anderson", "bgrayo@storify.com"
        UNION SELECT "Gloria Larson", "rgonzalesn@upenn.edu"
        UNION SELECT "Eric Bishop", "jcastillom@cnet.com"
    ) sub2
    CROSS JOIN (SELECT @seq2 := 0) sub3
) sub3
ON sub1.seq = sub3.seq
ON DUPLICATE KEY UPDATE name=VALUES(name), email=VALUES(email);

This is selecting all the records from your existing table and adding a sequence number, and getting all the update names and adding a sequence number to them. Then the records are joined on the sequence number matching, and used as the source to do an insert. ON DUPLICATE KEY UPDATE is used on the update so that the records will be updated with the changes random name and email address.

If you are doing this with a lot of records it is probably easier to insert the names and email addresses into a table and then select from that rather than selecting constants.

Kickstart
  • 21,403
  • 2
  • 21
  • 33
0

You can create two temporary tables with auto-increment columns. You can then join them using those auto-increment columns.

create temporary table tmp_userids (
    row_nr int auto_increment primary key,
    user_id int
)
    select null as row_nr, id as user_id
    from users
;

create temporary table tmp_fakedata (
    row_nr int auto_increment primary key,
    name varchar(50),
    email varchar(50)
);

insert into tmp_fakedata(name, email) values
    ("Russell Dixon", "jburkel@lulu.com"),
    -- ........
    ("Eric Bishop", "jcastillom@cnet.com")
;

update user u
join tmp_user_ids i on i.user_id = u.id
join tmp_fakedata f on f.row_nr  = i.row_nr
set u.name  = f.name,
    u.email = f.email
Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53