HIVE SerDeproperties input regex

Question

My question is how to properly use SerDeProperties to parse the lines below. I have tried multiple variations and I continue to get fill my tables with null values. Below I have the SerDe and the sample data. From my under standing ([^\s]*) should be anthing before ^ whitespace \s match 0 or more characters*. Likewise the next regex should put everything before the line return in the next column

My intent is to divide the numbers into one column and everything else into another column. What is wrong with my interpretation of the SerDe?

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ("input.regex" = "([^\s]*) ([^\n]*)");

1134999 06Crazy Life
6821360 Pang Nakarin
10113088    Terfel, Bartoli- Mozart: Don
10151459    The Flaming Sidebur
6826647 Bodenstandig 3000
10186265    Jota Quest e Ivete Sangalo
6828986 Toto_XX (1977

The regex is OK, but `(\d+) ([^\n]*)` would be better. There's something wrong with the hive code. — Laurel, Apr 04 '16 at 01:41
I am still seeing the same issue when I run the code. Everything loads as null. — Benny Baysinger, Apr 04 '16 at 01:53

score 0 · Answer 1 · edited May 23 '17 at 11:50

0

Try this (or something similar):

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES  (
   "input.regex" = "(\\d+) ([^\\n]*)",
   "output.format.string" = "%1$s %2$s"
)
STORED AS TEXTFILE;

Modified from here.

edited May 23 '17 at 11:50

Community

1
1

answered Apr 04 '16 at 01:59

Laurel

5,965
14
31
57

HIVE SerDeproperties input regex

1 Answers1