Questions tagged [regexserde]

RegexSerDe is a part of Hive SerDe library used for serializing and deserializing data structures efficiently and generic. Hive uses the SerDe interface for IO. RegexSerDe is short for Regular Expression Serializer/Deserializer. This tag should be used for questions related RegexSerDe usage and development.

RegexSerDe uses regular expression (regex) to serialize/deserialize. It can deserialize the data using regex and extracts groups as columns. It can also serialize the row object using a format string. RegexSerDe is as much a part of the standard Hive distribution as the other SerDes currently in hive-serde. The Hive SerDe library is in org.apache.hadoop.hive.serde2

15 questions
1
vote
1 answer

RegEx to create AWS Athena Table (RegexSerDe)

I am trying to create AWS Athena table based on the logs stored in S3. I intend to use RegEx to create the table but I could not find RegEx which will work for me CREATE EXTERNAL TABLE `dev_logs`( `date_time` string COMMENT '', `type` string…
Moin
  • 166
  • 8
1
vote
1 answer

regex for access log in hive serde with newline

With aws athena services, I try to import csv file including new line data Importing data uses hive serde format. If data is like this, (each data is enclosed in double quotes. "") "DataA"|"DataB"|"DataC" "Data1"|"Data2 with new…
1
vote
1 answer

Hive - Regex for the SYSLOG/ERRORLOG

I want to query the syslog(basically its my SQL error log) using Athena. here is my sample data. 2019-09-21T12:19:32.107Z 2019-09-21 12:19:24.17 Server Buffer pool extension is already disabled. No action is necessary.…
TheDataGuy
  • 2,712
  • 6
  • 37
  • 89
1
vote
1 answer

hive create table input.regex - filter out all rows starting with a char

I want to create table in Hive CREATE TABLE table ( a string ,b string ) PARTITIONED BY ( pr_filename string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ('input.regex'='reg_exp') ; but…
jmt
  • 719
  • 1
  • 9
  • 28
1
vote
1 answer

Can we create several entries from one line?

My logs look like this: client_id;event_1;event_2;event3 And i would like to get an SQL Table like this: client_id | event --------------------- ... | event_1 ... | event_2 ... | event_3 I am new to Hive, it seems to me…
Cinn
  • 4,281
  • 2
  • 20
  • 32
1
vote
1 answer

Insert data in hive using multidelimeter

how to insert data in hive using multidelimeter and between the column the delimiter is not specified. Below is my data : 25380 20130101 2.514 -135.69 58.43 8.3 1.1 4.7 4.9 5.6 0.01 C 1.0 -0.1 0.4 97.3 …
shael
  • 177
  • 9
1
vote
1 answer

Regex SerDe doesn't support the serialize() method error

I have a table structure as below. CREATE TABLE db.TEST( f1 string, f2 string, f3 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='(.{2})(.{3})(.{4})' ) STORED AS INPUTFORMAT …
Dileep Dominic
  • 499
  • 11
  • 23
1
vote
1 answer

Hive with regexserde property doesn't work properly

I used regex101 website to validate my regex: ([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)" "(.*?)" "(.*?)" It works fine for the log below 66.240.70.141 - - [01/Mar/2018:06:16:46 +0000] "GET…
M. Costa
  • 11
  • 2
1
vote
0 answers

Hive RegexSerDe Multiline Log matching showing NULL values after one line

Trying to load Apache Tomcat logs with multiple lines for single row , but its only loading single line and showing NULL values for rest of lines until it reaches next record . I have tried regex from earlier post but they are not working , the…
1
vote
1 answer

insert data into table using csv file in HIVE

CREATE TABLE `rk_test22`( `index` int, `country` string, `description` string, `designation` string, `points` int, `price` int, `province` string, `region_1` string, `region_2` string, `taster_name` string, `taster_twitter_handle` string,…
1
vote
1 answer

Hive table property to consider consecutive delimiters as one delimiter

jan 18 "value1 is null" feb 4 "value1 is null" in the above dataset there is consecutive delimiters between the 1st and 2nd column in second row how to handle consecutive delimiters as one delimiter.
Mohan M
  • 115
  • 2
  • 9
0
votes
1 answer

Can i filter the files(filenames) from which i wanted to create a hive table in databricks?

I have serverlogs enabled on s3 bucket. The log files have names as: 2023-02-16-00-16-16-A4210A3BBB675006. The first part of the filename is the date. And i extract various fields from the contents of the file using regex serde and create a hive…
0
votes
1 answer

Unable to perform select count(*) when using RegexSerde on Hive

I am reading data from a flat file with fixed length, and I applied the following script: CREATE EXTERNAL TABLE `test_table`.`test_data` (test_column1 STRING, test_column2 STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'…
Kong Yong
  • 13
  • 1
  • 4
0
votes
1 answer

Prevent Inserting NULL while using Hive Regex Serde

RegexSerDe uses regular expression (regex) to deserialize data. It doesn't support data serialization. It can deserialize the data using regex and extracts groups as columns. In deserialization stage, if a row does not match the regex, then all…
-3
votes
1 answer

creation a hive table for unstructed data

how to create hive table for below data ..?? 3.94.78.5 - 69827 [15/Sep/2013:23:58:36 +0100] "GET /KBDOC-00033.html HTTP/1.0" 19.38.140.62 - 21475 [15/Sep/2013:23:58:34 +0100] "GET /KBDOC-00033.html HTTP/1.0" 19.38.140.62 - 21475…
Sai Mammahi
  • 217
  • 2
  • 14