1

In a powershell script, I have a mysqldump command which outputs to stdin. The goal is to replace all occurences of a string in that stdin before pushing it into a file, because there is not enough disk space on the machine to hold two separate files (dump is around 30Go).

I have tried this (removed the invoke-expression and mysql args):

mysqldump [...args] | ForEach-Object -Process {$_ -replace 'sourceText','targetText' | Add-Content $dumpDataFile}

Or this:

mysqldump [...args] | Foreach-Object {$_ -replace 'sourceText','targetText'} | Set-Content $dumpDataFile

but it is eating up all the memory on the machine.

I have also tried replacing content in the result file but it always ends up in copying to an another file. I also thought about reading line by line and replacing line by line to a new file, with each X lines removing lines from the original file, but methods I have found to cut lines in files end up eating all memory.

In linux I would have used sed, I know it exists for windows but I do not want to add a dependency to the script.

Here is the command that is run:

$expr = "& 'C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqldump.exe' --defaults-extra-file=env.cnf --log-error=err.log --no-create-info foo | ForEach-Object -Process {$_ -replace 'foo','bar' | Add-Content dump.sql}"
Invoke-Expression $expr

UPDATE I have found that even piping out to out-null eats up all the memory:

& 'C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqldump.exe' --defaults-extra-file=env.cnf --log-error=err.log --no-create-info foo | out-null

also the scripts run on an amazon virtual machine which has powershell 4

UPDATE 2 This also eats up all the memory, but it does not when running from cmd:

& 'C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqldump.exe' --defaults-extra-file=env.cnf --log-error=err.log --no-create-info foo > dump.sql

Do you know how to call the full replace command with cmd? I do not manage to escape the mysqldump executable path

UPDATE 3 Realized that my dump contains huge tables, which results in some of the INSERT line being extremely long (thus the memory usage maybe). I tries without extended inserts but it is too long to import then.

Bouni
  • 595
  • 6
  • 12
  • Have you try this on the relevant output file : (Get-Content $dumpDataFile).replace('sourceText', 'targetText') | Set-Content $dumpDataFile – Sanpas Jan 28 '19 at 07:48
  • Yes, it is pushing all the content in memory before writing it – Bouni Jan 28 '19 at 08:54
  • ok thx i did have big file (30go) for testing i'm sorry. Have you trying tu use some .Net package on your powershell script ? Like System.IO i think thoses was keep perform than powershell cmdlets – Sanpas Jan 28 '19 at 08:59
  • Can you try this example :$content = [System.IO.File]::ReadAllText($path).Replace("[MYID]","MyValue") [System.IO.File]::WriteAllText($path, $content) https://stackoverflow.com/a/30893960/9940803 – Sanpas Jan 28 '19 at 09:02
  • The command you posted should not exhaust your system's memory, unless it's the `mysqldump` process eating up memory. You said something about `Invoke-Expression`. please show the exact command you're running. Do you get the same behavior if you run the command you posted in your question from a PowerShell console? – Ansgar Wiechers Jan 28 '19 at 10:01
  • @AnsgarWiechers I saw in the task manager that the powershell script was getting all the memory, not the mysqldump (and db is in a separate machine). I have updated the post with the command. I will try from the powershell console – Bouni Jan 28 '19 at 10:44
  • Remove the `Invoke-Expression`. It's not needed for what you're doing, and it's [considered harmful](https://blogs.msdn.microsoft.com/powershell/2011/06/03/invoke-expression-considered-harmful/) to begin with. – Ansgar Wiechers Jan 28 '19 at 11:16
  • @AnsgarWiechers the invocation string is built dynamically, how can I call it then? – Bouni Jan 28 '19 at 11:24
  • The entire commandline is built dynamically? Why? And how? Which parts of it are actually subject to change? – Ansgar Wiechers Jan 28 '19 at 12:37

2 Answers2

0

If the disk space is premium, how about compressing the data? If NTFS compression isn't good enough, let's write the output into a GZipStream. It should offer good savings for text data. Thus the file on disk would be considerably smaller.

First off, a compression function (idea from a blog post):

function Compress-Stream {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline=$true)]
        [AllowEmptyString()]
        [string]$Row
    )
    begin {
        $ms = New-Object System.IO.MemoryStream
        $cs = New-Object System.IO.Compression.GZipStream($ms, [System.IO.Compression.CompressionMode]::Compress)
        $sw = New-Object System.IO.StreamWriter($cs)
        }
    process {
        if(-not [string]::IsNullOrWhiteSpace($row)) {
            $sw.Write($Row + [environment]::NewLine)
        }
    }
    end {
        try {$cs.Close(); $cs.Dispose()} catch{} 
        try {$sw.Close(); $sw.Dispose()} catch{}

        $s = [System.Convert]::ToBase64String($ms.ToArray());
        try {$ms.Close(); $ms.Dispose()} catch {}
        $s
    } 
}

Sample usage is to query DBA Overflow data dump. Tt's much more manageable that SO. On my system the result set is 13 MB uncompressed, 3,5 MB compressed.

# SQL Server, so sqlcmd for illustration.
# Pipe results to compression and pipe compressed data into a file
sqlcmd -E -S .\sqli001 -d dbaoverflow -Q "select id, postid from votes order by id;" `
| compress-stream | Set-Content -Encoding ascii -Path c:\temp\data.b64

This should provide a compressed text file. To process it, use MemoryStream and GZipStream again:

$d = get-content c:\temp\data.b64
$data = [System.Convert]::FromBase64String($d)
$ms = New-Object System.IO.MemoryStream
$ms.Write($data, 0, $data.Length)
$ms.Seek(0,0) | Out-Null
$sr = New-Object System.IO.StreamReader(New-Object System.IO.Compression.GZipStream($ms, [System.IO.Compression.CompressionMode]::Decompress))

# $sr can now read decompressed data. For example,
$sr.ReadLine()
id          postid
$sr.ReadLine()
----------- -----------
$sr.ReadLine()
1           2

Doing replacements and writing the final result into another a file should be easy enough.

vonPryz
  • 22,996
  • 7
  • 54
  • 65
  • I am afraid that this solution will not solve my issue. As written in the last update of the post, even piping to out-null boosts the memory. I would expect piping to a compress stream to do at least the same. – Bouni Jan 28 '19 at 11:57
  • @Bouni ReadLine returns the next line from the input stream, [or null](https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader.readline?view=netframework-4.7.2) if the end of the input stream is reached. – vonPryz Jan 28 '19 at 12:00
  • thanks, sorry I have updated my comment in the meantime I am not sure this will work – Bouni Jan 28 '19 at 12:08
  • I think I will try to use your method but compress after dump and delete the original file. I will keep you informed – Bouni Jan 28 '19 at 12:37
  • If Powershell's memory usage is a killer, you might get good enough results with NTFS compression. Try that too. – vonPryz Jan 28 '19 at 12:44
0

In the end I use python to replace the string in the dump file while sending it to mysql. It is fast enough and low on memory.

Bouni
  • 595
  • 6
  • 12