1
CREATE TABLE `rk_test22`(
`index` int, 
`country` string, 
`description` string, 
`designation` string, 
`points` int, 
`price` int, 
`province` string, 
`region_1` string, 
`region_2` string, 
`taster_name` string, 
`taster_twitter_handle` string, 
`title` string, 
`variety` string, 
`winery` string)
ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
'input.regex'=',(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)') 
STORED AS INPUTFORMAT 
'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://namever/user/hive/warehouse/robert.db/rk_test22'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true', 
'numFiles'='1', 
'skip.header.line.count'='1', 
'totalSize'='52796693', 
'transient_lastDdlTime'='1516088117');

I created the hive table using above command. Now I want to load the following line (in CSV file) into table using load data command. The load data command shows status OK but i cannot see data into that table.

0,Italy,"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco  (Etna),White Blend,Nicosia
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Something simillar happened to me once. My problem was that the regex expression was not right, so the data was there, but was unreadable. Can you check that? – SCouto Jan 19 '18 at 07:10

1 Answers1

0

If you are loading one line CSV file then that line is skipped because of this property: 'skip.header.line.count'='1'

Also Regex should contain one capturing group for each column. Like in this answer: https://stackoverflow.com/a/47944328/2700344

And why do you provide these settings in table DDL:

'COLUMN_STATS_ACCURATE'='true'
'numFiles'='1', 
'totalSize'='52796693', 
'transient_lastDdlTime'='1516088117'

All these should be set automatically after DDL and ANALYZE.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thanks it worked removed it 'skip.header.line.count'='1' – Harshit Mehta Jan 19 '18 at 07:44
  • @HarshitMehta And Regex? is your regex correct? If it works, could you please explain it? I'd like to know how does it work – leftjoin Jan 19 '18 at 07:48
  • 1
    , - comma ?: - it means that the group is matched but is not captured for back-referencing i.e non-capturing group (?=) - positive lookahead. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured [^\"] - any character which is not "(" or ")" * - zero or more times \\) - followed by closing parentheses Instead of writing ,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$) we can write "\"(,\")?" i.e., \" - a double quote character (,\") - followed by comma and double quote zero or 1 time – Harshit Mehta Jan 19 '18 at 09:00