0

I have log data and I want to extract each information into a variable

The following is sample one line log. {:id=>306, :name=>"bblite", :cpu=>{:quota=>4, :allocated=>4, :actual=>0}, :memory=>{:quota=>8192, :allocated=>8192, :actual=>8578}, :cluster_stats=>{"wc1104"=>{:cpu=>0, :mem=>8578}}}

I need variable that have all ids,a variable that have all names,a variable that have CPUs and a variable that have all cluster stats

The following is the portion of my pig script. I can store the ids but I have no idea how to extract the rest of them using regex.

. . .

matching_messages = FILTER raw_lines BY (LOWER(message) MATCHES '.*cc_altus-plaform.*');

ids = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'id=>\\d*',0);

names = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'name=>\\"\\",',0);

line_with_date = FOREACH matching_messages GENERATE
DateFormatter(timestamp) AS formatted_time: chararray, message;

DUMP names;

1 Answers1

0

The following codes snippet is the regex I have written which works:

id = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'(?<=id=>)\\d*',0);

name = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'name=>\\"[\\w]*\\"',0);

cpu = FOREACH matching_messages GENERATE REPLACE( REGEX_EXTRACT(message, 'cpu=>\\{.*?\\}',0), ',','');

memory = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'memory=>\\{.*?\\}',0);

cluster = FOREACH matching_messages GENERATE REGEX_EXTRACT(message,'cluster_stats=>\\{.*?\\}',0);
E.Z
  • 1,958
  • 1
  • 18
  • 27