23

Does anyone know of a Java library that will let me parse .PO files? I simply want to create a Map of IDs and Values so I can load them into a database.

Angel O'Sphere
  • 2,642
  • 20
  • 18
Mike Sickler
  • 33,662
  • 21
  • 64
  • 90

6 Answers6

12

I searched the Internet and couldn't find an existing library, either. If you use Scala, it's quite easy to write a parser yourself, thanks to its parser combinator feature.

Call PoParser.parsePo("po file content"). The result is a list of Translation.

I have made this code into a library (can be used by any JVM languages, including Java, of course!): https://github.com/ngocdaothanh/scaposer

import scala.util.parsing.combinator.JavaTokenParsers

trait Translation

case class SingularTranslation(
  msgctxto: Option[String],
  msgid:    String,
  msgstr:   String) extends Translation

case class PluralTranslation(
  msgctxto:    Option[String],
  msgid:       String,
  msgidPlural: String,
  msgstrNs:    Map[Int, String]) extends Translation

// http://www.gnu.org/software/hello/manual/gettext/PO-Files.html
object PoParser extends JavaTokenParsers {
  // Removes the first and last quote (") character of strings
  // and concats them.
  private def unquoted(quoteds: List[String]): String =
    quoteds.foldLeft("") { (acc, quoted) =>
      acc + quoted.substring(1, quoted.length - 1)
    }

  // Scala regex is single line by default
  private def comment = rep(regex("^#.*".r))

  private def msgctxt = "msgctxt" ~ rep(stringLiteral) ^^ {
    case _ ~ quoteds => unquoted(quoteds)
  }

  private def msgid = "msgid" ~ rep(stringLiteral) ^^ {
    case _ ~ quoteds => unquoted(quoteds)
  }

  private def msgidPlural = "msgid_plural" ~ rep(stringLiteral) ^^ {
    case _ ~ quoteds => unquoted(quoteds)
  }

  private def msgstr = "msgstr" ~ rep(stringLiteral) ^^ {
    case _ ~ quoteds => unquoted(quoteds)
  }

  private def msgstrN = "msgstr[" ~ wholeNumber ~ "]" ~ rep(stringLiteral) ^^ {
    case _ ~ number ~ _ ~ quoteds => (number.toInt, unquoted(quoteds))
  }

  private def singular =
    (opt(comment) ~ opt(msgctxt) ~
     opt(comment) ~ msgid ~
     opt(comment) ~ msgstr ~ opt(comment)) ^^ {
    case _ ~ ctxto ~ _ ~ id ~ _ ~ s ~ _ =>
      SingularTranslation(ctxto, id, s)
  }

  private def plural =
    (opt(comment) ~ opt(msgctxt) ~
     opt(comment) ~ msgid ~
     opt(comment) ~ msgidPlural ~
     opt(comment) ~ rep(msgstrN) ~ opt(comment)) ^^ {
    case _ ~ ctxto ~ _ ~ id ~ _ ~ idp ~ _ ~ tuple2s ~ _ =>
      PluralTranslation(ctxto, id, idp, tuple2s.toMap)
  }

  private def exp = rep(singular | plural)

  def parsePo(po: String): List[Translation] = {
    val parseRet = parseAll(exp, po)
    if (parseRet.successful) parseRet.get else Nil
  }
}
Ngoc Dao
  • 1,501
  • 3
  • 18
  • 27
10

According to Java gettext utilities Manual you may convert PO file to a ResourceBundle class using msgfmt --java2 program and read it using java.util.ResourceBundle or gnu.gettext.GettextResource - I suppose it to be a most efficient way. Gettext-commons do exactly the same including intermediate process creation to call msgfmt because it is positioned as following:

Gettext Commons is Java library that makes use of GNU gettext utilities.

If you still want exactly a Java library then the only way I see is to write your own library for parsing this format i.e. rewrite msgfmt source code from C to Java language. But I'm not sure it will be faster than create process + run C program.

linuxbuild
  • 15,843
  • 6
  • 60
  • 87
  • 3
    I also trying the same, could you please tell how exactly I can use 'msgfmt' in my java project to convert po to ResourceBundle.It looks like command. – Prashant Shilimkar Apr 14 '16 at 07:32
5

gettext-commons is the only one I've found while doing some research some time back.

Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327
  • I couldn't find the code in that project that actually reads PO files. Did you? – Mike Sickler Jan 09 '11 at 04:33
  • 1
    Gettext-commons calls the msgfmt as an intermediate step, so you can't avoid a creation of process. See this figure http://xnap-commons.sourceforge.net/gettext-commons/gettext-structure.png – linuxbuild Jan 19 '11 at 23:54
2

.MO parser (not Java, but Scala), parses into Map : http://scalamagic.blogspot.com/2013/03/simple-gettext-parser.html , source: http://pastebin.com/csWx5Sbb

user2053898
  • 479
  • 1
  • 5
  • 8
  • Welcome to Stackoverflow! Generally we like answers on the site to be able to stand on their own - Links are great, but if that link ever breaks the answer should have enough information to still be helpful. Please consider editing your answer to include more detail. See the [FAQ](http://www.stackoverflow.com/faq) for more info. – slm Apr 11 '13 at 12:56
  • First link is unusable: "This blog is open to invited readers only". – izogfif Aug 10 '18 at 09:59
2

The tennera project on github contains an ANTLR-based parser for GNU Gettext PO/POT. I think it is used by Redhat for a web-based translation software.

Jörn Horstmann
  • 33,639
  • 11
  • 75
  • 118
  • For anyone looking, that gettext library was split out from Tennera as **JGettext**: https://github.com/zanata/jgettext – seanf Sep 01 '18 at 10:06
0

I have found some java classes to read and write po files : https://launchpad.net/po-parser

Florent Valdelievre
  • 1,546
  • 3
  • 20
  • 32