0

Alright so I've looked around and haven't quite found something like this online

I have to write a script to generate minimum of 5 million unique rows of test data for a table, and these are the columns it needs: • First Name • Last Name • City • State

(can't be gibberish data)

suggestions on how to do this at all? If possible, I would appreciate some code examples to help me get going. The lecture notes and slides are completely useless and I have no idea how it's supposed to be done.

I found this, and the top answer looks like the closest thing to what I'm doing, just not sure how to change/adjust the columns and attributes in the create table statement to match mine: How to generate 1000000 rows with random data?

Jaluvshuskies
  • 35
  • 1
  • 7
  • That answer at that link is just generating random numbers, which isn't *terrible*, but probably not exactly what are you looking for. Instead [creating temp tables](https://stackoverflow.com/questions/2671518/how-to-create-a-temporary-table-in-oracle) like @PatrickSJ suggests below and then Cross join them to create your [INSERT... SELECT...](https://docs.oracle.com/cd/E17952_01/mysql-5.6-en/insert-select.html) statement. Just make sure you sneak some funny first names into the list. – JNevill Dec 08 '17 at 22:14

1 Answers1

2

Make 4 tables, 1 for each column and do a cross join that you select into your random data table. In each table put your values, e.g., the first_name table has 100 names, last_name has 100 names, city as 100 names, and state as 50. 100 * 100 * 100 * 50 = 50 million combinations.

Just make sure each table has no duplicates.

Edit: In the end @Jaluvshuskies and I had this discussion in chat. For anyone who comes across this and wonders what we did: we made the insert statements in Excel from a copy-paste list of googled first and last names, cities, and 50 USA states.

Each entry was 1 insert statement, e.g.,

INSERT INTO first_name (firstname) VALUES ('Sophia');
INSERT INTO first_name (firstname) VALUES ('Olivia');
INSERT INTO first_name (firstname) VALUES ('Emma');
INSERT INTO first_name (firstname) VALUES ('Ava');
INSERT INTO first_name (firstname) VALUES ('Isabella');

The then an insert from a cross join subquery and all done.

PatrickSJ
  • 510
  • 5
  • 13
  • This is a good way of pulling this off. One of the few times it's acceptable to create a cross join. – JNevill Dec 08 '17 at 22:11
  • with the 100 values in each table, do i manually have to enter? like if i just looked up 100 random first names (with no duplicates), would i just paste that for example? same thing with city, state, and last name – Jaluvshuskies Dec 08 '17 at 22:38
  • I'd just pull the names from a baby names and family names webpage. Since this is usually already a list / single column tabular format I jump over to Excel to quickly add the ' ' around the strings. Then use `INSERT INTO tmp_first_name VALUES ('name_1', 'name_2', etc)`. Excel can easily format the list. Just use `="'"&A1&"'," ` then remove the extra , from the last item. – PatrickSJ Dec 08 '17 at 22:42
  • ok sorry I'm actually having a lot of trouble figuring out the excel thing. I got the data in a column (atm im just doing first name) but i cant figure out for my life the formula. i did what you typed but all it entered was just the 1st value in A1 – Jaluvshuskies Dec 08 '17 at 23:09
  • Are the names in a list, e.g., A1, B1, C1, etc? Just go a drag-down (copy-down, whatever it is called) in Excel to copy the formula and adjust for each row. See the little square on the bottom-right corner? Select that and drag. Also, slight typo on the insert statement. It is `INSERT INTO tmp_first_name VALUES ('name_1'), ('name_2')` etc. – PatrickSJ Dec 08 '17 at 23:13
  • nah, so firstnames are all in 1 column, which is column A. then at the very bottom i was going to put the formula but it doesnt seem to work right. every list of names i could find online were in a vertical list lol – Jaluvshuskies Dec 08 '17 at 23:18
  • In A1 type `'(''` that is 2 single quotes, not 1 double quote, in B1 put the first item of the list. In C1 type `''),'` and again that is 2 single quotes not 1 double quote. Does it look like `(' name_1 '),` when you see all 3 cells? If so, the formatting is correct and you can just concatenate. Then copy-paste the formula for all 100 columns (you can do it en masse). – PatrickSJ Dec 08 '17 at 23:29
  • is the formula still ="'"&A1&"'," ? (i put this in D1). cause i tried that formula and i got '(''', – Jaluvshuskies Dec 08 '17 at 23:52
  • Copy-paste of ="('"&A1&"')," gives...? – PatrickSJ Dec 08 '17 at 23:54
  • copy paste ="('"&A1&"')," give this: ('('''), – Jaluvshuskies Dec 08 '17 at 23:55
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/160827/discussion-between-jaluvshuskies-and-patricksj). – Jaluvshuskies Dec 08 '17 at 23:56
  • 2
    If you were able to find lists of values and put them in Excel, you shouldn't have needed to create all those INSERT statements. There are other ways: export to CSV and import (for example using SQL Developer), or create external table to read directly from the CSV file. –  Dec 09 '17 at 02:23
  • True. Didn't think of it. – PatrickSJ Dec 09 '17 at 02:49