Japanese text error in Google speech api php

Question

Google Speech api is working fine for me when I use 'languageCode' => 'en-US' with English audio file. But when using 'languageCode' => 'ja-JP' with Japanese audio file, its returning broken text like "Transcription: ã‚‚ã—ã‚‚ã—è² ã‘ãƒ›ãƒ³ãƒ€ã—ã¦ã‚‚ã—ã‚‚ã—"

Sample code from google :

# Includes the autoloader for libraries installed with composer
require __DIR__ . '/vendor/autoload.php';

# Imports the Google Cloud client library
use Google\Cloud\Speech\SpeechClient;

# Your Google Cloud Platform project ID
$projectId = 'YOUR_PROJECT_ID';

# Instantiates a client
$speech = new SpeechClient([
    'projectId' => $projectId,
    'languageCode' => 'en-US',
]);

# The name of the audio file to transcribe
$fileName = __DIR__ . '/resources/audio.raw';

# The audio file's encoding and sample rate
$options = [
    'encoding' => 'LINEAR16',
    'sampleRateHertz' => 16000,
];

# Detects speech in the audio file
$results = $speech->recognize(fopen($fileName, 'r'), $options);

foreach ($results[0]->alternatives() as $alternative) {
    echo 'Transcription: ' . $alternative['transcript'] . PHP_EOL;
}

I've checked the Cloud Speech API Client Libraries and followed the sample from Google.

score 0 · Accepted Answer · answered Oct 03 '17 at 01:23

Google Speech API returning the response in Japanese correctly inside $results. The Default encoding type is UTF-8. Its clearly written in the documentation. Google\Cloud\Language\LanguageClient

The problem was echo in the foreach which breaks down the Japanese character. In my case I actually don't need to echo rather than use the $results. So now it's working fine for me.

Perhaps, If someone wants to use echo to show the result, following links can be helpful.

Thanks.

Japanese text error in Google speech api php

1 Answers1