0

I'm searching large logfiles for specific words. I've found some basic solutions on this if the String contains white spaces. But what I need is to find all occurrences of a specific word that can be surrounded by any character.

e.g. looking for "hello": "abchello" returning 1 or "##hello123...@456hello8" returning 2

I could do that with basic for loops, but I want to use mostly streams (and perhaps parallel streams) for this due to the speed gain (going thru large files).

The following seems to find any version of "hello" but it stops at the first one and goes to the next line:

bufferReader = Files.newBufferedReader(Paths.get(file));
Long count = bufferReader != null ? bufferReader.lines().filter(l -> l.matches(".*hello.*")).count() : null;
Lofi Peng
  • 13
  • 2
  • What do you mean "stops at the first one"? `lines.filter()` iterates over individual lines, yes, and your `matches` method is the same as just `endsWith("hello")` if matched on the whole line, which would be the "last one", not "first" – OneCricketeer Jul 14 '22 at 19:02
  • Perhaps you want to [Count a substring](https://stackoverflow.com/questions/45888605/how-to-find-the-count-of-substring-in-java)? Not check if a string exists at all? – OneCricketeer Jul 14 '22 at 19:03
  • @OneCricketeer what I mean if the line has "##hello123...@456hello8" it will only count one of the "hello" and go to the next line. What I need is to count all of them in that line. – Lofi Peng Jul 14 '22 at 19:10
  • Please see the post I linked to – OneCricketeer Jul 14 '22 at 19:10
  • @OneCricketeer I had a look. I have no idea how to use that in a stream. I don't want to use temps or for-loops, unless there is really no otherway – Lofi Peng Jul 14 '22 at 19:13
  • You'd use something like `lines().mapToInt(l -> countMatches(l, "hello").sum())`. The for loop within the function is necessary; not everything can be a stream – OneCricketeer Jul 14 '22 at 19:15
  • `grep -o hello * |sed 's/\([^:]\):.*/\1/' |uniq -c` if you don't have to use Java. – David Conrad Jul 14 '22 at 20:00
  • @OneCricketeer Thanks, "mapToIn" is definatly a better approach. And you are right about the required for loop. I found another solution using "split" which is using while-loops. – Lofi Peng Jul 15 '22 at 08:22

1 Answers1

2

Using org.apache.commons.lang3.StringUtils#countMatches:

bufferReader = Files.newBufferedReader(Paths.get(file));
Integer count = bufferReader != null ? bufferReader.lines().mapToInt(line -> StringUtils.countMatches(line, "hello")).sum() : null;

More ways to count matches: Occurrences of substring in a string

FogZ
  • 46
  • 3
  • Thanks for the solution and the link. This is very close to OneCricketeer's answer in the comments above – Lofi Peng Jul 15 '22 at 08:23