35

I have an open-source project (here) whose documentation is currently in French. The documentation is generated from XML comments in code, using Sandcastle. Now I would like to translate the documentation to English and provide documentation in both languages, but I don't really know where to start...

  • Do I need to extract the XML comments from the code and put them in a separate file? If yes, are there any tools to automate the process?
  • I'm using Sandcastle Help File Builder to build the documentation; do I need to create a separate project to build the doc in English, or can it be done from the same project?
  • Are there any tools to help in the translation process? e.g. display the original and translated doc side by side?

I'm also interested in links on how to produce multilingual documentation, as I couldn't find anything useful on Google...

Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • that's the reason, why we do all commentaries in english :) –  Jun 03 '11 at 08:44
  • @Andreas, that's what I usually do... but this project is a special case, as it was initially intended for members of a French speaking community (Developpez.com). Now I would like to broaden the audience of this library... – Thomas Levesque Jun 03 '11 at 10:08

5 Answers5

21

One strategy, which would require some coordination with the Sandcastle XSLT files, would be to use the xml:lang attribute on your XML documentation. Visual Studio 2010 allows multiple tags to remain (although you may get complaints about duplicate tags).

/// <summary>
/// Gets or sets the fill size of the load operation.
/// </summary>
/// <summary xml:lang="fr">
/// Obtient ou définit la taille de remplissage de l'opération de chargement.
/// </summary>
public int FillSize
{
    get;
    set;
}

Resulting output:

<member name="P:Namespace.MyAttribute.FillSize">
    <summary>
    Gets or sets the fill size of the load operation.
    </summary>
    <summary xml:lang="fr">
    Obtient ou définit la taille de remplissage de l'opération de chargement.
    </summary>
</member>
user7116
  • 63,008
  • 17
  • 141
  • 172
  • Thanks, I like this approach! It's similar to the solution suggested by Remi Bourgarel, but it feels cleaner... – Thomas Levesque Jun 07 '11 at 15:34
  • I never found the time to actually try it, but it's the answer I like the most, so I accept it ;) – Thomas Levesque Oct 13 '11 at 08:26
  • I haven''t been able to find any documentation about how to make this work. When I tried it, I just get both languages showing up in my chm file. Any hints on how to do that "coordination with the Sandcastle XSLT files"? – Scott Solmer Jul 20 '17 at 15:00
6

We did that like this :

  • We put a "<EN>" tag after all our documentation tag like this :

    /// <summary>
    /// Description du produit
    /// <EN>Product's description</EN>
    /// </summary>
    
  • Then in the sandcastle xslt file (Development/presentaton/vs2005/transforms/main_sandcastle.xsl) we went to the template matching "param" (line 95 for us) and we added

    <span class="trad"> 
        <xsl:value-of select="msxsl:node-set(.)/EN"/>
    </span>
    
  • And then you can change the css to display the translation in your favorite color.

user7116
  • 63,008
  • 17
  • 141
  • 172
remi bourgarel
  • 9,231
  • 4
  • 40
  • 73
  • That looks promising, thanks! But it displays both languages on the same page, right? I'd like to generate 2 separate documentations... – Thomas Levesque Jun 07 '11 at 15:17
  • Indeed it displays both documentations, but you can build two template for sandcastle, one displaying the content of the EN tag the other not. Or you can handle this via css but it's kind of dirty. – remi bourgarel Jun 07 '11 at 15:20
  • BTW, do you put this EN tag only in the summary, or in all documentation fields (params, return values, etc)? – Thomas Levesque Jun 07 '11 at 15:24
  • I think so, I used this code a little while ago so I'm not sure you won't need to change the xslt file a little bit more. maybe with only a template matching the tag "EN" it should be enough. – remi bourgarel Jun 07 '11 at 15:35
2

One possible strategy would be having a default language in code, and supply translations separately.

No matter which localized languages i would have finally, i'd prefer to choose English as the default/fallback language of the documentation.

Code structure provides indexing for your translation database, for example:

Type, NameWithNamespace, OptionalParameterName

"member", "MyProject.Core.Loader.FillSize", ...

You could have a tool that would allow for translation in a UI for each namespace/member.

You can have a separate team of translators looking through the items that have no translation yet, and supply translations.

And you can start shiping a translated documentation as your ship a release as soon as you get the amount of translated items above a threshold.

A changed default translation would indicate that you need a new translation for all other languages too.

Of course, if you do a major namespace-only changes, you can remap the namespaces as an ad-hoc remapping operation in database.

If you run opensource project, it makes sence to use a collaborative online translation tool.

One example of such collaborative translation strategy implemented in production is https://translations.atlassian.com/

Basically you could just step in and start contributing translations online.

It is set up to translate the products themselves, not the documentation, but the same practice apply.

George Polevoy
  • 7,450
  • 3
  • 36
  • 61
  • Yes, I think having the translated docs separate from the code is the only sensible way to do it in the general case. My use case was a bit different, because I never intended to provide more than 2 languages; so it was still reasonable to have both languages in the code, making it easier to keep up to date. – Thomas Levesque Sep 01 '15 at 13:38
2

Just in case anyone needs a solution there is nuget package called Surviveplus.XmlCommentLocalization. Stunning!

Alex K.
  • 784
  • 5
  • 12
1

You have yourself a tricky one. There really isn't a "best practise" since most software is developed in English.

That being said, if you look at other multi lingual documentation, how do they handle this problem?

Take International Standards. Something like ISO 9001. You have the same document (ISO 9001:2008) available in English, French, Russian etc.

Or ISO 5247-2 has the one document in English+French+Russian.

How would you handle changes? Say I give you a patch, but my comments are only in English, what would your process be? What if you have patch A in English, patch B in Spanish and patch C in English + French?

Another option is to fork the project. Have the main branch be in French with the latest build, then bring the other languages up to date in their own time?

Separating the comments in your source code would get messy to maintain. You are then basically using a resource file in your build script.

Is this a solved problem already? If you think of any large, multi lingual, open source project, how do they handle it?

Christian Payne
  • 7,081
  • 5
  • 38
  • 59