2

I'm using Facebook's Duckling to parse text. When I pass the text: 13h 47m it correctly classifies the entire text as DURATION (= 13 hours 47 minutes).

However, when I pass the text: 13h 47m 13s it cannot identify the 13s part of the String as being part of the DURATION. I was expecting it to parse it as 13 hours, 47 minutes and 13 seconds but it essentially ignores the 13s part as not being part of the DURATION.

Command: curl -XPOST http://127.0.0.1:0000/parse --data locale=en_US&text="13h 47m 13s"
JSON Array: 
[
  {
    "latent": false,
    "start": 0,
    "dim": "duration",
    "end": 7,
    "body": "13h 47m",
    "value": {
      "unit": "minute",
      "normalized": {
        "unit": "second",
        "value": 49620
      },
      "type": "value",
      "value": 827,
      "minute": 827
    }
  },
  {
    "latent": false,
    "start": 8,
    "dim": "number",
    "end": 10,
    "body": "13",
    "value": {
      "type": "value",
      "value": 13
    }
  }
]

Is this a bug? How can I update Duckling so that it parses the text as described above?

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
Bradford Griggs
  • 439
  • 2
  • 15

1 Answers1

4

The documentation seems pretty clear about this:

To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:

  • Duckling/<Dimension>/<Lang>/Rules.hs
  • Duckling/<Dimension>/<Lang>/Corpus.hs
  • Duckling/Dimensions/<Lang>.hs (if not already present in Duckling/Dimensions/Common.hs)
  • Duckling/Rules/<Lang>.hs

Taking a look in Duckling/Duration/Rules.hs, I see:

ruleIntegerUnitofduration = Rule
  { name = "<integer> <unit-of-duration>"
  , pattern =
    [ Predicate isNatural
    , dimension TimeGrain
    ]
  -- ...

So next I peeked in Duckling/TimeGrain/EN/Rules.hs (because Duckling/TimeGrain/Rules.hs did not exist), and see:

grains :: [(Text, String, TG.Grain)]
grains = [ ("second (grain) ", "sec(ond)?s?",      TG.Second)
         -- ...

Presumably this means 13h 47m 13sec would parse the way you want. To make 13h 47m 13s parse in the same way, I guess the first thing I would try would be to make the regex above a bit more permissive, maybe something like s(ec(ond)?s?)?, and see if that does the trick without breaking anything else you care about.

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
  • 1
    Brilliant! Thanks! Since, I'm essentially editing the source code, is there a way to ensure that the modified values don't get overwritten when I download a new version of Duckling from GitHub besides for manually editing the file again? – Bradford Griggs Mar 23 '22 at 22:06
  • @_ BradfordGriggs Please see [this issue on GitHub](https://github.com/facebook/duckling/issues/29). Would be interested if @Daniel Wagner could chime in on this.. – Vance Brockberg Mar 23 '22 at 22:13
  • @BradfordGriggs You can add a specific commit hash (that includes your new commit changing the regex) to your cabal project file. For the more general case, where you do actually want the commit to change, well... `git merge`, I guess? – Daniel Wagner Mar 23 '22 at 22:20
  • @VanceBrockberg I'm not sure there's much productive I can chime in on. It sounds like a hard problem, and I've never touched this library before today. I wouldn't presume to know the best way forward. – Daniel Wagner Mar 23 '22 at 22:21