0

My question is how to properly use SerDeProperties to parse the lines below. I have tried multiple variations and I continue to get fill my tables with null values. Below I have the SerDe and the sample data. From my under standing ([^\s]*) should be anthing before ^ whitespace \s match 0 or more characters*. Likewise the next regex should put everything before the line return in the next column

My intent is to divide the numbers into one column and everything else into another column. What is wrong with my interpretation of the SerDe?

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ("input.regex" = "([^\s]*) ([^\n]*)");

1134999 06Crazy Life
6821360 Pang Nakarin
10113088    Terfel, Bartoli- Mozart: Don
10151459    The Flaming Sidebur
6826647 Bodenstandig 3000
10186265    Jota Quest e Ivete Sangalo
6828986 Toto_XX (1977  
Benny Baysinger
  • 70
  • 1
  • 10

1 Answers1

0

Try this (or something similar):

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES  (
   "input.regex" = "(\\d+) ([^\\n]*)",
   "output.format.string" = "%1$s %2$s"
)
STORED AS TEXTFILE;

Modified from here.

Community
  • 1
  • 1
Laurel
  • 5,965
  • 14
  • 31
  • 57