I am learning SQL, and am trying to work with some sample data to understand various concepts.
Here are what the first few lines of the Yahoo! Search Marketing Advertiser Bid-Impression-Click data on competing Keywords data look like:
1 08bade48-1081-488f-b459-6c75d75312ae 2 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 2.0 0.0
29 08bade48-1081-488f-b459-6c75d75312ae 3 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 1.0 0.0
29 08bade48-1081-488f-b459-6c75d75312ae 2 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 1.0 0.0
11 08bade48-1081-488f-b459-6c75d75312ae 1 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 2.0 0.0
76 08bade48-1081-488f-b459-6c75d75312ae 2 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 1.0 0.0
48 08bade48-1081-488f-b459-6c75d75312ae 3 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 2.0 0.0
97 08bade48-1081-488f-b459-6c75d75312ae 2 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 1.0 0.0
123 08bade48-1081-488f-b459-6c75d75312ae 5 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 1.0 0.0
119 08bade48-1081-488f-b459-6c75d75312ae 3 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 1.0 0.0
73 08bade48-1081-488f-b459-6c75d75312ae 1 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 1.0 0.0
Please note that this data is only available from Yahoo! on request.
Here is the description of the data:
(1) "ydata-ysm-keyphrase-bid-imp-click-v1_0.gz" contains the following fields:
0 day
1 anonymized account_id
2 rank
3 anonymized keyphrase (expressed as list of anonymized keywords)
4 avg bid
5 impressions
6 clicks
Primary key of the data is a combination of fields date, account_id, rank and keyphrase. Average bid, impressions and
clicks information is aggregated over the primary key.
I am trying to load this data from a flat file into a MySQL table using the load data infile
statement.
In general, I think that I understand how to specify the data types of the columns, but am not sure about 3 anonymized keyphrase (expressed as list of anonymized keywords)
:
Question 1: Should I specify them as separate varchar
columns, or is there a datatype that allows them to be stored together as "list" type?
Here is the query that I currently have to create the table to hold this data.
# create a new database
create database webscopedata;
show databases;
use webscopedata;
# create the table
drop table bidders;
create table bidders (
daynum int,
account_id varchar(40),
rank int,
keyphrase1 varchar(100),
keyphrase2 varchar(100),
keyphrase3 varchar(100),
keyphrase4 varchar(100),
keyphrase5 varchar(100),
avg_bid double,
impressions double,
clicks double);
Now when I try to run the query
load data infile "ydata-ysm-keyphrase-bid-imp-click-v1_0" into table bidders fields terminated by "\t";
That is, when I specify the tab separator, I get the error:
Error Code: 1261. Row 1 doesn't contain data for all columns
This leads me to believe that the fields are not properly delimited when specifying the tab delimiter. So I try to specify multiple delimiters using the query:
load data infile "ydata-ysm-keyphrase-bid-imp-click-v1_0" into table bidders fields terminated by "' '\t";
Here I am trying to use multiple field terminators, but that does not seem to work, and I get an error:
Error Code: 1265. Data truncated for column 'daynum' at row 1
Question 2: How can I specify parsing of this data with multiple delimiters?