0

As part of our project we are downloading huge chunk of eml files from secure sftp location,after downloading we need to add a subtag in each of the downloaded file which is around 90 MB ,i tried to add the sub tag using powershell script that i have seen in other site and pasted below,it works fine for small files of 10 kb to 200kb but when i try to use the same script for huge files the scripts got struck, can anyone please help to get through it.

(Get-Content F:\EmlProcessor\UnZipped\example.eml) | 
    Foreach-Object {
        $_ # send the current line to output
        if ($_ -match "x-globalrelay-MsgType: ICECHAT") 
        {
            #Add Lines after the selected pattern 
            " X-Autonomy SubTag=GMAIL"
        }
    } | Set-Content F:\EmlProcessor\EmlProcessor\example2.txt

SAMPLE EML FILE

Date: Tue, 3 Oct 2017 07:44:32 +0000 (UTC)
From: XYZ
To: ABC
Message-ID: <1373565887.28221.1507075364517.JavaMail.tomcat@HKLVATAPP075>
Subject: Symphony: 2 users, 4 messages, duration 00:00
MIME-Version: 1.0
Content-Type: multipart/mixed; 
    boundary="----=_Part_28220_1999480254.1507075364517"

x-globalrelay-MsgType: GMAIL
x-symphony-StreamType: GMAIL
x-symphony-StreamID: RqN3HnR/ajgZvWOstxzLuH///qKcERyOdA==
x-symphony-ContentStartDateUTC: 1507016636610
x-symphony-ContentStopDateUTC: 1507016672387
x-symphony-FileGeneratedDateUTC: 1507075364516

------=_Part_28220_1999480254.1507075364517
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html><html><body><p><font color=3D"grey">Message ID: Un/pfFrGvvVy=
T6quhMBKjX///qEezwdFdA=3D=3D</font><br>2017-10-03T07:43:56.610Z  0

----
------
-----
</HTML>

As shown in the above sample input file i must add a text "X-Autonomy SubTab" above or below "x-globalrelay-MsgType".

I tried to add subtag to sample file which is of 90 MB ,as said it got struck,though my requirement is to add to nearly 2K files by looping through each file ,i have tried it for one file with the above code but was unsuccessful,I am very new to batch & windows powershell scripting, any quick help is appreciated.

Squashman
  • 13,649
  • 5
  • 27
  • 36
rajendra
  • 1
  • 1
  • Take a look at these links: http://rkeithhill.wordpress.com/2007/06/17/optimizing-performance-of-get-content-for-large-files/ and see Roman's answer here: https://stackoverflow.com/questions/4192072/how-to-process-a-file-in-powershell-line-by-line-as-a-stream – Squashman Oct 04 '17 at 21:47

1 Answers1

0

Are you sure it is stuck or just takes longer? Your code has to iterate through thousands of lines to find a match.

I did not have large text file to test with so converted a large csv (60 MB) to txt and this was working for me pretty fast (10-15 sec).

Note: Since you are new and you realize the power of PowerShell, I am going to be really generous. Most people would expect you to put in some effort yourself but I have faith that you will at least try to understand what the script is doing. Because if you use the scripts you get here directly on your environment without testing, you could end up doing some serious damage. So, at least for the sake of testing, you would understand what each line does. I have edited the code to use functions for scalability. I could use multi-threading to speed up the process but since this is a heavy CPU oriented operation, I do not think it would do much good.

#Coz functions are the best
Function Insert-SubTag ($Path)
{
    $FileName = $Path | Split-Path -Leaf
    $File = Get-Content -Path $Path
    $Line = $File | Select-String -Pattern "x-globalrelay-MsgType"
    $LineNumber = $Line.LineNumber

    #Since Linenumber starts from 1 but array count starts from 0
    $File[$LineNumber - 1] = "$Line
 X-Autonomy SubTag=GMAIL"

    $SavePath = "F:\EmlProcessor\UnZipped2\$FileName" #You can also pass the save folder as a parameter to this function like $path
    $File | Set-Content -Path $SavePath
}

#If you have the list of Files in a text file use this
$FileList = Get-content C:\FileList.txt

#If you have a folder, and want to iterate through each file, use this
$FileList = (Get-ChildItem -Path "F:\EmlProcessor\UnZipped").FullName

Foreach ($FilePath in $FileList)
{
    Insert-SubTag -Path $FilePath
}

Assuming that x-globalrelay-MsgType only appears once in the text file.

Do not forget to consider selecting this as the answer if it works for you.

Sid
  • 2,586
  • 1
  • 11
  • 22
  • Thanks Robin the script is working fine but it took around 1 min for 80 MB file thats ok ,But how to loop through multiple files and add subtag with above code and save with same file name?sorry for asking everything but i am very new to batch and shell scripting. – rajendra Oct 05 '17 at 05:21
  • first you have to create the list of files, for example by doing `$FileList = Get-Childitem -File`. Then you can loop through that list: `Foreach ($Item in $FileList){$File = Get-Content -Path $Item.Fullname ...}`. – whatever Oct 05 '17 at 09:12
  • But how to save the file with same name once the subtag text is added at different folder location – rajendra Oct 05 '17 at 11:58
  • `$File | Set-Content -Path $Item.Fullname` – whatever Oct 05 '17 at 12:07
  • Sorry, for a different target location you have to use `$File | Set-Content -Path "\$Item.Name"` – whatever Oct 05 '17 at 12:54
  • Just Convert that to a function and like @whatever mentioned, iterate through each file and call the function – Sid Oct 05 '17 at 13:11
  • Hi Robin sidharth ,Thank you very much for your help,i do understand that asking everything is not correct,as it was very urgent and i was taking a lot of pressure to complete this script as early as possible ,i had no other option but to post it here .Anyways i had come across another script that uses [System.IO.File] to read all lines, i have posted the answer ,though i am not sure which one is the better option.processing time for both the scripts are same.any ways again tanks for your help it really helped me a lot – rajendra Oct 06 '17 at 05:14