I need to construct a Hive SerDe RegEx for pipe delimited data.
Sample data:
CEF:0|Microsoft|Microsoft Windows||Microsoft-Windows-Security-Auditing:434|An account was logged off.|Low| eventId=260 externalId=44 msg=Network: A user or computer logged on to this computer from the network. categorySignificance=/Informational categoryBehavior=/Access/Stop categoryDeviceGroup=/Operating System catdt=Operating System categoryOutcome=/Success categoryObject=/Host/Operating|Vista ad.EventIndex=-972 ad.WindowsParserFamily=Windows 2008 R2|2008|7|Vista ad.WindowsVersion=Windows Server
For this we need to separate out first seven columns by pipe and consider everything after that as a single column.
DDL: (CEF STRING, Vendor STRING, Product STRING, Version STRING, Signature STRING, Name STRING, Severity STRING, Extension STRING)
So Sample data output should be mapped to columns as follows: Col1: CEF:0 Col2: Microsoft Col3: Microsoft Windows Col4: Col5: Microsoft-Windows-Security-Auditing:434 Col6: An account was logged off. Col7: Low Col8: eventId=260 externalId=44 msg=Network: A user or computer logged on to this computer from the network. categorySignificance=/Informational categoryBehavior=/Access/Stop categoryDeviceGroup=/Operating System catdt=Operating System categoryOutcome=/Success categoryObject=/Host/Operating|Vista ad.EventIndex=-972 ad.WindowsParserFamily=Windows 2008 R2|2008|7|Vista ad.WindowsVersion=Windows Server
What should be the input.regex?
Also is it possible to have a Map data type for the columns in (key=value) format using this Regex?