1

I've been messing with this Powershell script (I installed Powershell on my Mac OS)

The program is supposed to return the amount of Japanese characters that show up in a .txt file. I am getting numbers returning instead of actual japanese characters like in the expected screenshot.

None of these .txt files even have a majority of numbers in them.

This is an idea of what I am expecting: Expected outcome

Basically outputting Japanese text characters and how often they show up.

Current output: Current Flawed Outcome My output is shooting me random single letters and numbers instead of Japanese characters like in the expected outcome.

The text file my script is calling from is all Japanese with only a few letters.

Here is my code (I just modified the first line):

$folder = “/Users/mbp/Desktop/nier_unpacked_2_extracted“
$files = gci -recurse $folder | where { ! $_.PSIsContainer }
$fileContents = $files | foreach { gc -encoding utf8 $_.fullname }
$lines = $fileContents | foreach { if ($_ -match "^JP: (.*)$") { $matches[1] } }
$chars = $lines | foreach { $_.ToCharArray() }
$groups = $chars | group-object
$totals = $groups | sort-object -desc -property count

This is the original code(before modification):

$folder = "F:\nier_unpacked_2_extracted"
$files = gci -recurse $folder | where { ! $_.PSIsContainer }
$fileContents = $files | foreach { gc -encoding utf8 $_.fullname }
$lines = $fileContents | foreach { if ($_ -match "^JP: (.*)$") { $matches[1] } }
$chars = $lines | foreach { $_.ToCharArray() }
$groups = $chars | group-object
$totals = $groups | sort-object -desc -property count

Here is the link to the resource i got the code from if that helps: https://dev.to/nyctef/extracting-game-text-from-nier-automata-1gm0

四季朝
  • 21
  • 6
  • When you print `$lines` variable in console are there Japanese characters in output? Can you check the file encoding in text editor? Maybe it's not utf8. – mcbr Nov 08 '20 at 10:44
  • I pretty sure this is a duplicate question: check: [Displaying Unicode in Powershell](https://stackoverflow.com/a/49481797/1701026) and [Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)](https://stackoverflow.com/a/57134096/1701026). Long story short: try another [IDE](https://en.wikipedia.org/wiki/Integrated_development_environment), e.g.: [Visual Studio Code](https://code.visualstudio.com/) – iRon Nov 08 '20 at 11:03
  • @iRon Huh? Powershell displays Japanese characters just fine without needing any special magic. – Tomalak Nov 08 '20 at 11:11
  • @Toalak, my Japanese is a little rusty and I am probably mistaken, but if I enter the Hiragana unicode character [`[char]0x306e`](https://stackoverflow.com/a/53807563/1701026) in Visual Sudio Code (with a PowerShell extension). It returns: `の` – iRon Nov 08 '20 at 11:31
  • @iRon I'm not sure I'm following. The OP has a bunch of text files that contain Japanese lines and wants to group them by character. That works just fine (even the code he shows works just fine). None of it requires calling `CHCP` (that's not available anyway in PowerShell), changing the IDE, or doing anything except running the code. What's your point? – Tomalak Nov 08 '20 at 11:39
  • As aside, you should not use curly so-called 'smart-quotes' like `“` and `“` in code because they can do weird things there. Replace them with straight quotes. Also, you show us the expected output, but not what the code as you have it now produces. Please add that to the question aswell. – Theo Nov 08 '20 at 11:46
  • @Theo The designers of PowerShell actually allow curly quotes as string delimiters. It wouldn't say it's been a wise design decision, but it has been made. – Tomalak Nov 08 '20 at 11:54
  • 1
    @Tomalak, probably, I shouldn't have used the word "*another IDE*" but "*an IDE*". It works in Visual Studio Code, [Windows PowerShell ISE, but not from the console](https://stackoverflow.com/a/20023046/1701026) (at least for me) @四季朝, can you add some details (to the question) on the PowerShell version, Operating System, and how you invoke the script? (If you are on Windows, can you try: **Windows PowerShell *ISE***: Run `PowerShell_ISE.exe`) – iRon Nov 08 '20 at 13:54
  • 1
    To create an [mcve], replace the first **3** lines with: `$fileContents = 'JP: ぁ あ ぃ い ぅ う ぇ え ぉ お か'` – iRon Nov 08 '20 at 14:00
  • Hello I just woke up and will be testing all of these suggestions. I added an image showing the output, it would be too difficult to put it in text so please excuse the image screenshot. – 四季朝 Nov 08 '20 at 16:10
  • @iRon Running the script in Visual Studio Code worked! Its a shame it doesn't work on the base terminal but hey now I am using visual studio which is pretty robust. Thank you! – 四季朝 Nov 08 '20 at 17:23
  • I also converted the file to txt from rtf, that may have helped aswell – 四季朝 Nov 08 '20 at 17:37
  • It turns out the issue was that the file was rtf instead of txt. I tried it in the normal terminal and it worked aswell – 四季朝 Nov 08 '20 at 21:22

0 Answers0