I have this script in PowerShell which I am going to use eventually to translate an XML file with some Japanese words and replace with the English. For now this is a simple regex matching example:
$pattern = "(?<=\>)[\p{IsHiragana}\p{IsKatakana}\p{IsCJKUnifiedIdeographs}]+(?=\<\/)"
$text = 'tag3>日本語</tag>漢字</tag>.'
$matches = $text | Select-String -Pattern $pattern -AllMatches | ForEach-Object { $_.Matches.Value }
$matches
This works fine and will return the following:
日本語
漢字
However, I want it to also grab on or more English characters before or after the Japanese characters, and the whole thing wrapped between > and </
For this string:
tag3>Some text before 日本語 and some text after</tag><Before text 漢字</tag>
It should grab these:
Some text before 日本語 and some text after
Before text 漢字