How do I use regex to insert a new line in Japanese text?

Question

I have been having trouble formatting text. I have a Japanese dictionary definition and would like to insert a new line every time I see something like: （１）（２）（３）

The code I have now only works sometimes:

formatDefinition(def) {
  let reg = /[\uFF10-\uFF19]/g // gets the japanese numbers
  let finalDef;
  const numArr = def.match(reg);
  if (numArr) {
    for (let i = 0; i < numArr.length; i++) {
           if (i == 0) {
             finalDef = `<div class="mb-1">${def.substring(0, def.indexOf(numArr[i]) - 1)}<br /></div>`;
             finalDef += `${def.substring(def.indexOf(numArr[i]) - 1, def.indexOf(numArr[i+1]) - 1)}`;
           }
          
          else if ((i != numArr.length - 1) && i > 0) {
            finalDef += `<br />${def.substring(def.indexOf(numArr[i]) - 1, def.indexOf(numArr[i+1]) - 1)}`;
          }
          
          else if (i == numArr.length - 1) {
            finalDef += `<br />${def.substring(def.indexOf(numArr[i]) - 1, def.length)}`;
          }
          
          }
          return `<p>${finalDef}</p>`;
  } 
  
  else {
    return def;
  }

Basically what I'm doing is trying to find the numbers (eg "1") and then inserting a new line before or after. However, would there be a better way of doing this through .replace()? The problem here is that when there are repeating numbers (eg （１）（２）（１）), my code messes up.

"def" or the function input would look something like this:

let def = 'ひ 【非】 ■一■ [1] （名） （１）道理に合わないこと。不正。 ⇔是 「―をあばく」「―とする」 （２）不利であること。うまくゆかないこと。「形勢―なり」 （３）あやまり。欠点。「―を認める」 （４）そしること。「―を唱える」 ■二■ （接頭） 漢語の名詞・形容動詞に付いて，それに当たらない，それ以外である，などの意を表す。「―能率的」「―常識」「―公式」'

This works fine with my code, but something like this breaks it, as there is a repeated number:

let def = 'かか・る [2] 【掛(か)る・懸(か)る】 （動ラ五［四］） ❶物がほかの物に取り付けられたり，支えられたりしてそこにある。《懸・掛》 （１）上方に掲げられる。ぶらさがっている。「壁に絵が―・っている」「凧(タコ)が木の枝に―・る」「大きな看板が―・った店」「戸口に表札が―・っている」「のれんが―・っている」 （２）中空にある。「月が中天に―・る」「天の川が夜空に―・る」 （３）〔自在鉤にかけて火の上に置いたことから〕 鍋などが火の上にのせられている。「ガスコンロに鍋が―・っている」 （４）〔竿秤(サオバカリ)の鉤にかけて重さをはかることから〕 秤で重さが量られる。「重すぎてこの秤には―・らない」 （５）もたれる。よりかかる。「手すりに―・って休む」「もたれ―・る」「しなだれ―・る」「かきおこされて人に―・りてものす/蜻蛉（上）」 （６）仕組んだものに捕らえられる。「大きな魚が網に―・る」「わなに―・る」「計略に―・る」 （７）（「心にかかる」などの形で）心配になる。「子供のことが気に―・る」「心に―・る」 （８）戸などが開かないように，掛け金や鍵で固定されている。「ドアに鍵が―・っている」 ❷物が上方に置かれる。《懸・掛》 （１）ある物がほかの物を覆うように置かれる。「雲が月に―・る」「霞が―・る」「カバーが―・った本」「ワックスが―・った床」 （２）液体や粉末が上方から注がれる。「水が―・る」「波しぶきが―・る」「雨が―・る」「ほこりが―・る」「ドレッシングの―・ったサラダ」 ❸身に作用を受ける。《懸・掛》 （１）好ましくない作用を受ける。「あなたに迷惑が―・っては申し訳ない」'

My code would just keep repeating the one in （１）a number of times... Any solutions or suggestions?

Thank you very much.

score 1 · Accepted Answer · answered Oct 23 '20 at 03:08

Is this what you want?

function formatDefinition(def) {
  let reg = /（[\uFF10-\uFF19]+）/g // gets the Japanese numbers in Japanese parentheses
  return def.replace (reg, '<br />$&');
}
let def1 = 'ひ 【非】 ■一■ [1] （名） （１）道理に合わないこと。不正。 ⇔是 「―をあばく」「―とする」 （２）不利であること。うまくゆかないこと。「形勢―なり」 （３）あやまり。欠点。「―を認める」 （４）そしること。「―を唱える」 ■二■ （接頭） 漢語の名詞・形容動詞に付いて，それに当たらない，それ以外である，などの意を表す。「―能率的」「―常識」「―公式」'
let def2 = 'かか・る [2] 【掛(か)る・懸(か)る】 （動ラ五［四］） ❶物がほかの物に取り付けられたり，支えられたりしてそこにある。《懸・掛》 （１）上方に掲げられる。ぶらさがっている。「壁に絵が―・っている」「凧(タコ)が木の枝に―・る」「大きな看板が―・った店」「戸口に表札が―・っている」「のれんが―・っている」 （２）中空にある。「月が中天に―・る」「天の川が夜空に―・る」 （３）〔自在鉤にかけて火の上に置いたことから〕 鍋などが火の上にのせられている。「ガスコンロに鍋が―・っている」 （４）〔竿秤(サオバカリ)の鉤にかけて重さをはかることから〕 秤で重さが量られる。「重すぎてこの秤には―・らない」 （５）もたれる。よりかかる。「手すりに―・って休む」「もたれ―・る」「しなだれ―・る」「かきおこされて人に―・りてものす/蜻蛉（上）」 （６）仕組んだものに捕らえられる。「大きな魚が網に―・る」「わなに―・る」「計略に―・る」 （７）（「心にかかる」などの形で）心配になる。「子供のことが気に―・る」「心に―・る」 （８）戸などが開かないように，掛け金や鍵で固定されている。「ドアに鍵が―・っている」 ❷物が上方に置かれる。《懸・掛》 （１）ある物がほかの物を覆うように置かれる。「雲が月に―・る」「霞が―・る」「カバーが―・った本」「ワックスが―・った床」 （２）液体や粉末が上方から注がれる。「水が―・る」「波しぶきが―・る」「雨が―・る」「ほこりが―・る」「ドレッシングの―・ったサラダ」 ❸身に作用を受ける。《懸・掛》 （１）好ましくない作用を受ける。「あなたに迷惑が―・っては申し訳ない」'
console.log(formatDefinition(def1));
console.log(formatDefinition(def2));

Note the CJK parentheses in the regular expression (you may want to replace them with Unicode sequences).

Perfect! Thank you so much! You turned my messy code into like 2 lines :). — Edwin, Oct 23 '20 at 03:23
How would I modify the regex you gave so it ignores something like {（１）}? — Edwin, Oct 24 '20 at 22:31
I was able to figure it out: ```/(\uFF08[\uFF10-\uFF19]+\uFF09)(?![^{]*})/g``` from https://stackoverflow.com/questions/12493128/regex-replace-text-but-exclude-when-text-is-between-specific-tag (zb226) — Edwin, Oct 24 '20 at 22:48

How do I use regex to insert a new line in Japanese text?

1 Answers1