I have a arff
file which has following attributes:
@ATTRIBUTE "åäö" NUMERIC
@ATTRIBUTE "åøã" NUMERIC
The file is saved with UTF-8
. I am reading this file in my Java application using weka API. I can run the program without any issue from Eclipse.
However, when I am trying to run the program from powershell, or command prompt (simply using java -jar my-app.jar -data path/to/mydata.arff
), I am facing the below error:
java.io.IOException: Unable to determine structure as arff (Reason: java.lang.IllegalArgumentException: Attribute names are not unique! Causes: 'å??' ).
at weka.core.converters.ArffLoader.getStructure(ArffLoader.java:1204)
at weka.core.converters.ArffLoader.getDataSet(ArffLoader.java:1234)
at weka.core.converters.ConverterUtils$DataSource.getDataSet(ConverterUtils.java:269)
I tried to change the encoding (default is OEM United States (IBM437)
) as below.
Attempt1:
Set UTF-8
encoding in my ps1
script as below (source):
$OutputEncoding = New-Object -typename System.Text.UTF8Encoding
[Console]::OutputEncoding = New-Object -typename System.Text.UTF8Encoding
This didn't help, only changed the console output to ...Causes: '�??'...
from ...Causes: 'å??'...
.
Attempt2: Changing the encoding directly on console as below (source):
$OutputEncoding = [Console]::OutputEncoding
This too didn't work.
Is there anyway this can be fixed?
Update: This question is not a duplicate of Printing Unicode characters to the PowerShell prompt, as in my case it does not matter whether whether the right character is displayed on the command prompt or not, as my program does not attempt to do so. Also, please note that the answer of the said question (using [Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(850)
) produced the exactly same result, and thus provided no solution to this problem.
Additionally, executing using PowerShell ISE, and ConEmu also didn't help.
I assume that if the correct encoding can be set for the 'session' (or environment/context, not sure how to call this) it would be enough for my program to process the arff file correctly. However, I am not sure how.