My application need to parse some large string data. Which means I am heavily using Split, IndexOf and SubString method of string class. I am trying to use StringBuilder class whereever I have to do any concatenation. However when application is doing this parsing, app cpu usage goes high (60-70%). I am guessing that calling these string APIs is what's causing cpu usage to go high, speically the size of data is big (typical Length of string is 400K). Any ideas how can I verify what is causing cpu usage to go that high and also if there are any suggestion on how to bring cpu usage down?
-
3you haven't specified why a high CPU usage is a bad thing. Are you trying to leave enough 'breathing room' for other processes/threads? – vlad Mar 21 '11 at 14:30
-
1Profile it and look for the bottleneck. Are you sure it is not caused by IO operation(reading/writing to disc)? – Lukasz Madon Mar 21 '11 at 14:32
-
@Vlad. Isn't in general you would like cpu usage under control? When a high cpu usage is considered a good thing? – palm snow Mar 21 '11 at 14:53
-
1String parsing always generates 100% cpu usage, you are asking it do do real work. Finding out why you lose 40-30% should be your concern. Probably I/O, reading/writing file data. Nothing much you can do about that, but doing it in another thread so you can overlap it with the parsing can help. Getting it up to 100% is difficult to achieve. – Hans Passant Mar 21 '11 at 15:35
-
1@palm-snow define "under control". The fact that it's running at 100% means it doesn't have to wait for memory and/or I/O. It therefore means that it's doing what you asked it to do NOW, as opposed to a little bit later. I think that's a good thing. Like @Hans Passant mentioned, achieving 100% is usually a goal, not a problem. – vlad Mar 21 '11 at 16:09
4 Answers
One thing to check is that you are passing the StringBuilder around as much as possible, rather than creating a new one and then returning it's ToString() needlessly.
A much bigger gain though can be made if you process the data as smaller strings, read from a stream. Of course, this depends on just what sort of manipulation you are doing, but if at all possible, read your data from a StreamReader (or similar depending on the source) in small chunks, and then write it to a StreamWriter.
Often changes are only applicable within a given line of text, which makes the following pattern immediately useful:
using(StreamReader sr = new StreamReader(sourceInfo))
using(StreamWriter sw = new StreamWriter(destInfo))
for(string line = sr.ReadLine(); line != null; line = sr.ReadLine())
sw.WriteLine(ManipulateString(line));
In other cases where this doesn't apply, there are still ways to chunk the string to be processed up.

- 110,372
- 10
- 146
- 251
To find out where the CPU usage is coming from: see What Are Some Good .NET Profilers?
To reduce CPU usage: it depends, of course, on what's actually taking the time. You might, for instance, consider working not with actual substrings but with little objects encoding where they are within the big strings they came from. (There is no guarantee that this will actually be an improvement.) Very likely, when you profile your code there will be a few things that jump out at you as problems; they may well be things you'd never have guessed, and they may be very easy to fix as soon as you know they need fixing.

- 1
- 1

- 19,888
- 1
- 41
- 62
Further to Jon's answer if your parser does not need to do back-tracking i.e. it always reads through the sting in a forward direction and the source of the string is not a file/network stream that you can use a StreamReader
with just wrap your String in a StringReader
instead e.g.
//Create a StringReader using the String variable data which has your String in it
//A StringReader is just a TextReader implementation for Strings
StringReader reader = new StringReader(data);
//Now do whatever manipulation on the string you want...

- 28,022
- 11
- 77
- 119
-
+1 Yep, that can help, and would be worth a go if the string simply can't be got from a stream. However, if the string is got from a stream (even indirectly, like a massive Request.Form value that is ultimately from Request.InputStream with some processing done for you), then moving to take it directly from the stream can be a big gain. – Jon Hanna Mar 21 '11 at 17:40
-
Yeah I've written a lot of streaming parsers particularly in the last year or two and I always try and use `StreamReader` wherever possible – RobV Mar 21 '11 at 17:42
In your case you are using typically very large string (Length of string is 400K).. For operations on large string we can use "ROPE" data structure, which is very efficient for your case
Please refer below links for more information
https://iq.opengenus.org/rope-data-structure/
https://www.geeksforgeeks.org/ropes-data-structure-fast-string-concatenation/
STL ropes in c++ : https://www.geeksforgeeks.org/stl-ropes-in-c/

- 125
- 2
- 8