6

I'm in need to estimate localization effort needed for a legacy project. I'm looking for a tool that I could point at a directory, and it would:

  • Parse all *.cs files in the directory structure
  • Extract all C# string literals from the code
  • Count total number of occurrences of the strings

Do you know any tool that could do that? Writing it would be simple, but if some time can be saved, then why not save it?

Marcin Seredynski
  • 7,057
  • 3
  • 22
  • 29
  • For each file in DirectoryInfo.GetFiles("*.cs",Recursive) For each line in file check text string or "..." Seems like something that should not take more then 8 hours to build. – CodingBarfield Apr 07 '11 at 09:19
  • 2
    Writing a tool = 8h. Asking a question = 5min. Using a tool = 5min ;) – Marcin Seredynski Apr 07 '11 at 09:28
  • You have to be careful with comments. Do you want to extract strings in comments? Or do you want to eliminate comments first? Also, @"..." is not the same as "..." because the former can include line breaks. Also escaped characters must be handled (e.g. \" cannot be counted as terminating a string). It is not as simple as it looks. – Stephen Chung Apr 07 '11 at 09:39

4 Answers4

5

Use ILDASM to decompile your .DLL / .EXE.

I just use options to dump all, and you get an .il file with a section "User String":

User Strings
-------------------------------------------------------
70000001 : (14) L"Starting up..."
7000001f : (12) L"progressBar1"
70000039 : (21) L"$this.BackgroundImage"
70000065 : (10) L"$this.Icon"
7000007b : ( 6) L"Splash"

Now if you want to know how many time a certain string is used. Search for a "ldstr" like this:

IL_003c:  /* 72   | (70)000001       */ ldstr      "Starting up..." /* 70000001 */

I think this will be a lot easier to parse as C#.

GvS
  • 52,015
  • 16
  • 101
  • 139
1

Doing a quick search, I found the following tool that may or may not be useful to you.

http://www.devincook.com/goldparser/

I also found another SO user who was trying to do something similar.

Regex to parse C# source code to find all strings

Community
  • 1
  • 1
Josh Smeaton
  • 47,939
  • 24
  • 129
  • 164
0

Well, if you have hardcoded strings, you need to know what is your i18n effort first (unhardcoding them could be quite painful). Another issue: you need to count translatable words not distinct strings, that is the input for translation providers. And even though string might seem duplicated, it could be translated in a different way depending on the context, so you don't need to care about "distninct", you just have to count all words... That's how Localization works per my experience.

Paweł Dyda
  • 18,366
  • 7
  • 57
  • 79
  • Thanks Pawel. The tool I need is only for the rough estimation effort. I have some assumptions to work with and will have to go through every string occurrence manually. – Marcin Seredynski Apr 07 '11 at 09:34
  • I'd love to give you a name of some tool, but unfortunately the only tool I know for it is some internal to my company. Sorry about that. – Paweł Dyda Apr 07 '11 at 09:49
-1

In most common development, you should keep your strings external to your program source code. In your case, could you spare the effort to extract the strings into a resource file?

If so, then you can make use of the default localization solution in .NET, i.e.

resource.resx,

resource.fr.resx,

resources.es.resx

stores strings for different locales.

Updated :

The actual implementation depends on your project architecture/technology, resource files ain't the best way to do this, but it is the easiest, and the recommended way in .NET.

Like in this article

A few more tutorials A few more tutorials

Winfred
  • 875
  • 6
  • 12