I Followed This Stack Over Flow question where is shown how to count rows in pig.
The problem i found is, this one is incredibly time consuming if i do some regex filter match and other operation before try to count rows of filtered variable.
Here is my code
all_data = load '/logs/chat1.log' USING TextLoader() as line:chararray;
match_filter_1 = filter all_data by ( line matches 'some regex');
inputGroup = GROUP match_filter_1 ALL;
totalLine = foreach inputGroup generate COUNT (match_filter_1);
dump totalLine;
so, is there any way to get result faster?