0

Im new to HIVE and creating my first table!

for some reason all non-string values are showing as NULL (including int, BOOLEAN, etc.)

my data looks like this sample row:

58;"management";"married";"tertiary";"no";2143;"yes";"no";"unknown";5;"may";261;1;-1;0;"unknown";"no"

i used this to create the table:

create external table bank_dataset(
age       TINYINT, 
job       string, 
education string, 
default   BOOLEAN, 
balance   INT, 
housing   BOOLEAN, 
loan      BOOLEAN, 
contact   STRING,
day       STRING, 
month     STRING, 
duration  INT,
campaign  INT, 
pdays     INT, 
previous  INT,
poutcome  STRING,
y         BOOLEAN)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u003B'
STORED AS TEXTFILE
location '/user/marchenrisaad_gmail/Bank_Project'
tblproperties("skip.header.line.count"="1");
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Because hive can not process enclosing double quotes. So, first remove double quotes from your data using some shell script then load it. Pls let me know if it helps. – Koushik Roy Oct 27 '20 at 06:27
  • Read this answer: https://stackoverflow.com/a/54463863/2700344 – leftjoin Oct 27 '20 at 07:49
  • Also booleans will not work in your case. Better convert them using case statements. Read this: https://stackoverflow.com/a/55316873/2700344 Also columns in the sample data and table definition columns are different – leftjoin Oct 27 '20 at 08:27
  • check my answer i have a small issue – Marc Henry Saad Oct 27 '20 at 21:04

1 Answers1

0

Thanks for the comments it worked! but i have 1 issue. For every row i get all the data correctly then I get extra columns of null values. Find below my code:

create external table bank_dataset(age TINYINT, job string, education string, default BOOLEAN, balance INT, housing BOOLEAN, loan BOOLEAN, contact STRING,day INT, month STRING, duration INT,campaign INT, pdays INT, previous INT, poutcome STRING,y BOOLEAN)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\u003B",
   "quoteChar"     = '"'
)  
STORED AS TEXTFILE
location '/user/marchenrisaad_gmail/Bank_Project'
tblproperties("skip.header.line.count"="1");

Any suggestions?