2

I am new to Hive and am creating external tables on csv file. One of the issues I am coming across are values that contain multiple commas within string itself. For example, the csv file contains the following:

CSV File

When I create an external table in Hive, because there are columns within the "name" column, it shifts the first name to the right adding another column. This throws all of the data off when you view the table in Hive.

External Table result in Hive

Is there anything I can add to my script to keep the commas but also keep first and last name in the same column when the external table is created? Thank you all in advance - I am very new to Hive.

CREATE EXTERNAL TABLE database.table name (
ID INT,
Name String,
City String,
State String
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/xyz/xyz/database/directory/'
TBLPROPERTIES ("skip.header.line.count"="1");
Rspktcod
  • 23
  • 6
  • What about if you process the data first in MapReduce or with Spark/Flink program?? – Kenry Sanchez Mar 21 '19 at 14:19
  • Look at this question https://stackoverflow.com/questions/13628658/hive-load-csv-with-commas-in-quoted-fields – Kenry Sanchez Mar 21 '19 at 14:20
  • Possible duplicate of [Hive load CSV with commas in quoted fields](https://stackoverflow.com/questions/13628658/hive-load-csv-with-commas-in-quoted-fields) – Vin Mar 21 '19 at 14:34

1 Answers1

0

Check this solution - you need to add this line : ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

https://community.cloudera.com/t5/Support-Questions/comma-in-between-data-of-csv-mapped-to-external-table-in/td-p/220193

Complete DDL example:

create table hcc(field1 string,
field2 string,
field3 string,
field4 string,
field5 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\"");
Khaja Nizamuddin
  • 167
  • 1
  • 2
  • 9