3

Both represent the same form of different types of verbs - shouldn't they both parse into a single token? Even if 2 tokens makes more sense, they should be consistent and both parse into 2 I would think.

Edit: it was pointed out in comments that 見られる could also be passive - another example would be 食べれる, which also tokenizes into 2 tokens (食べ+れる), but is certainly potential.

Rollie
  • 4,391
  • 3
  • 33
  • 55
  • Just out of curiosity: what do you get for 見える? 見られる could also be the passive form, maybe that results in the difference? – Robby Cornelissen Jul 28 '17 at 03:51
  • 見える comes back as a single token. Interestingly, 行かれる comes back as 行か+れる, which is also not consistent with 見られる tokenization (would expect 行+かれる if 見+られる is correct). Good point regarding passive - edited question. – Rollie Jul 28 '17 at 04:08

1 Answers1

3

Short answer: because 行ける/見える are in the dictionary and 見られる isn't. (note: this is the case for both ipadic and unidic.)

In the case of 行ける and 見られる the distinction is pretty simple - 行く is a 五段/five-step verb and 見る is not. In Unidic and IPAdic stems of five-step verbs are registered due to the way verb endings are treated. Verb endings are basically all 助動詞 (recognizable units like られる that can't stand alone) or 補助動詞 (things like しまう that can stand alone), but dictionary-form endings like る or う aren't considered as either of those so they don't get their own part of speech tag and form one token with the verb root.

With 見える the situation is a little trickier - 見える is being treated as a root verb, and not just as the potential form of 見る. If you look at lex.csv in Unidic for example you'll see a bunch of conjugations of 見える where 見える is given as the base form. Looking at dictionaries it seems common for 見える to have its own entry, partly for historical reasons (check まみえる).

For a longer explanation of how and why verbs are broken into multiple tokens, look up the details of Short Unit Words, Medium Unit Words, Long Unit Words, and Bunsetsu. Documentation from NINJAL covers the concepts but with little detail for verbs; Comainu is a system that can detect all of these classes; and this lengthy article provides a good overview of the history in English.

Hope that helps!

polm23
  • 14,456
  • 7
  • 35
  • 59
  • Thanks! It's a very good explanation, and the linked cjvlang article was interesting to look through. But while I understand that some potential forms of verbs have entries in undict, I'm still not sure why (I'm afraid I didn't find anything related to まみえる). Why would 食べれる not be listed, but 行ける is? – Rollie Aug 14 '17 at 06:45
  • 食べる is 一段活用 (the stem does not change), so the stem (食べ) is the only thing in the dictionary besides the base form. 行く is 五段活用 (the stem changes), so **all stem variations** are in the dictionary, because vowel changes in conjugations are *part of the stem*. – polm23 Aug 15 '17 at 05:13
  • Also note: 食べれる is not the standard potential form, it's what's called a ら抜き言葉, very common in speech but historically considered a grammatical error. 食べられる is the standard potential form. Seems like Unidic includes ら抜き言葉 but IPAdic does not, though they don't seem to come out in a normal parse... – polm23 Aug 15 '17 at 05:17