3

I have to restore a MySQL 5.6 dump which has been created using the script below, in PowerShell :

cd "C:\Program Files\MySQL\MySQL Server 5.6\bin"
.\mysqldump.exe --defaults-extra-file=C:\backups\script.cnf -h 127.0.0.1 my_base > C:\backups\dump.sql

Unfortunately, the dump file contains bad encoded character (├® for é, ├á for à,├┤ for ô, etc.).
The dump file encoding is "UTF-16 LE BOM" (according to Notepad++).
Below a sample :

-- MySQL dump 10.13  Distrib 5.6.35, for Win64 (x86_64)
--
-- Host: 127.0.0.1    Database: my_base
-- ------------------------------------------------------
-- Server version   5.6.35-log

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `my_table`
--

DROP TABLE IF EXISTS `my_table`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `my_table` (
  `code` varchar(100) NOT NULL DEFAULT '',
  PRIMARY KEY (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `my_table`
--

LOCK TABLES `my_table` WRITE;
/*!40000 ALTER TABLE `my_table` DISABLE KEYS */;
INSERT INTO `my_table` VALUES ('├® ├á ├┤');
/*!40000 ALTER TABLE `my_table` ENABLE KEYS */;
UNLOCK TABLES;

I know that the command used to dump the database is messy on Windows but I have no choice : I can't get a new dump and I have to deal with it.
Is there a way to fix the file to get the correct data ?
If not, I am ready to find & replace !

mklement0
  • 382,024
  • 64
  • 607
  • 775
JYVD
  • 31
  • 1
  • if it's UTF-16, the character is likely encoded correctly in the dump, but being read by something expecting a different charset (like UTF-8). – Joel Coehoorn Oct 05 '21 at 19:05

1 Answers1

1

You can bypass your problem by using the --result-file option (see the docs), which makes mysqldump itself write to a file:

.\mysqldump.exe --result-file=C:\backups\dump.sql --defaults-extra-file=C:\backups\script.cnf -h 127.0.0.1 my_base

As for what you tried:

There are two problems (applies as of PowerShell 7.2):

  • Even when you use > to send data output by an external program to a file, PowerShell decodes the data into .NET strings first, using the character encoding stored in [Console]::OutputEncoding, which (unfortunately, still as of PowerShell 7.2) defaults to the system's active OEM code page.

    • It sounds like what mysqldump.exe outputs is actually UTF-8-encoded, which therefore causes PowerShell to misinterpret the output.

    • (Temporarily) setting [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new() would correct this misinterpretation.

  • > is in effect an alias for the Out-File cmdlet, which uses its default character encoding on output, which is unrelated to the original encoding. In Windows PowerShell, that default encoding is UTF-16LE ("Unicode"). (PowerShell (Core) 7+ now fortunately consistently defaults to BOM-less UTF-8, across all cmdlets).

    • To control the output file's character encoding, instead of using >, pipe to Out-File - or, preferably, for text-only data, which output from external programs always is in PowerShell - Set-Content - and pass the desired encoding to the -Encoding parameter.

For detailed background information, see this answer.

mklement0
  • 382,024
  • 64
  • 607
  • 775