5

I have a text file with following text

161624.406 : Send:[sometext1]
161624.437 : Send:[sometext2]
161624.468 : Send:[sometext3]
161624.499 : Send:[sometext4]
161624.531 : Send:[sometext5]

I want to keep only the sometext part in that file. Desired output is

sometext1
sometext2
sometext3
sometext4
sometext5

I am using the following code in Excel-VBA

Public Sub testa()
    a = "C:\Users\pankaj.jaju\Desktop\test.log"

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objTxtFile = objFSO.OpenTextFile(a, 1)
    strText = objTxtFile.ReadAll
    objTxtFile.Close
    Set objTxtFile = Nothing


    Set objRegEx = CreateObject("VBScript.RegExp")
    With objRegEx
        .Global = True
        .MultiLine = True
        .Pattern = "\[([^]]+)\]"
        Set objRegMC = .Execute(strText)
        b = objRegMC(0).SubMatches(0)
    End With
    Set objRegEx = Nothing

    Debug.Print b
End Sub

The problem is the output is displayed as sometext1 only. How do I ReplaceAll in the text file and save the file with the desired text only.

Pankaj Jaju
  • 5,371
  • 2
  • 25
  • 41
  • I assume it is ok to essentially delete the file and recreate it or do you need it to operate on the same file? – Cor_Blimey May 20 '14 at 17:45
  • That is not a critical thing ... I can do either – Pankaj Jaju May 20 '14 at 17:46
  • @pnuts - Yes. But my requirement is to change the file itself as I am going to use that file in some other application too which is not Excel based and that app needs the file with stripped text only. – Pankaj Jaju May 20 '14 at 17:51

4 Answers4

3

The regex.Replace method should do the trick.

Separate your pattern into groups like this: "(.*?)(\[)([^]]+)(\])(.*?)"

And now you can replace your input string with the matching group which is group three in this case: objRegEx.Replace(strText, "$3")


Here is a helpful link to different examples of Regex within Excel.

Community
  • 1
  • 1
Automate This
  • 30,726
  • 11
  • 60
  • 82
2

I know you've already got your answer. But for a simple program like this, why even utilize regular expressions? Here's a 4 line script that can do the same thing - yes it has been tested.

Dim a, text: a = "C:\testenv\test.log"
text = split(CreateObject("Scripting.FileSystemObject").OpenTextFile(a, 1).ReadAll, vbcrlf)
for i=0 to ubound(text) : text(i) = replace(split(text(i), "[")(1), "]", "") : next
CreateObject("Scripting.FileSystemObject").OpenTextFile(a, 2).Write(Join(text, vbcrlf))
Rich
  • 4,134
  • 3
  • 26
  • 45
  • Personally I would expand this to be more readable so its easier to understand, maintain and learn from. Other than that, +1 Nice simple solution for the text format given by OP. – Automate This May 20 '14 at 20:38
  • @PortlandRunner Thanks man! :) I must admit, I've found some rather unusual formats to manipulate text due to the fact that Regular Expressions can make my brain throb. – Rich May 20 '14 at 20:46
  • @Rich - Nice one .... but I wanted to use regular expression as my actual problem is bit complex. I posted my test code and data to get some insights from SO community. – Pankaj Jaju May 20 '14 at 21:18
  • @PankajJaju Makes sense. Glad you got it resolved. Thanks for the +1's gents! – Rich May 20 '14 at 21:32
0

The regex obviously only grabbing the first match of the string. I'm not proficient enough with regex to come up with a regex only solution that takes into account \n

A simple workaround would be to use objTxtFile.ReadLine instead of ReadAll

Read each string in one by one, apply regex and get output.

Public Sub testa()
    a = "C:\Users\pankaj.jaju\Desktop\test.log"

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objTxtFile = objFSO.OpenTextFile(a, 1)

    Do Until (objTxtFile.AtEndOfStream) 'Loop till EOF
        strText = objTxtFile.ReadLine 'Single line read instead of ReadAll

        Set objRegex = CreateObject("VBScript.RegExp")
        With objRegex
            .Global = True
            .MultiLine = True
            .Pattern = "\[([^]]+)\]"
            Set objRegMC = .Execute(strText)
            b = objRegMC(0).SubMatches(0)
        End With
        Set objRegex = Nothing

        Debug.Print b 'replace this with the code to output the string where you want etc
    Loop
    objTxtFile.Close
End Sub
ashareef
  • 1,846
  • 13
  • 19
  • Thanks for this but I have found a faster way to do it. – Pankaj Jaju May 20 '14 at 18:11
  • 1
    Try setting your text line to this: `objRegEx.Replace(strText, "")` and then you wont need `objRegMC`. Here is a [link](http://stackoverflow.com/q/22542834/2521004) that has an example of replacing the original excel cell using regex which is similar. – Automate This May 20 '14 at 18:11
  • @PortlandRunner - Using Replace, I ended up replacing the required substring. – Pankaj Jaju May 20 '14 at 18:23
  • @PortlandRunner - Actually I used your idea and I think it can be done. I just need to change my patter to `.Pattern = "(.*?)(([^]]+)\])(.*?)"` and `objRegEx.Replace(strText, "$2")`. The only problem now is to remove the square brackets. – Pankaj Jaju May 20 '14 at 18:26
  • 1
    @PankajJaju I see now what you meant. Can you move the square bracket into a new grouping `([)` – Automate This May 20 '14 at 18:29
  • @PortlandRunner - That was easy :-). `.Pattern = "(.*?)(\[)([^]]+)(\])(.*?)"` and `objRegEx.Replace(strText, "$3")` ... Voilà !!! – Pankaj Jaju May 20 '14 at 18:35
  • @PortlandRunner - Put it in answer section ... will tick it cause that is what i actually needed and it is the most elegant and fastest way i believe – Pankaj Jaju May 20 '14 at 18:36
0

OK ... found a way to do it (Thanks MSDN ... sometimes the docs are useful :-)). Sharing the answer

Public Sub testa()
    a = "C:\Users\pankaj.jaju\Desktop\test.log"

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objTxtFile = objFSO.OpenTextFile(a, 1)
    strText = objTxtFile.ReadAll
    objTxtFile.Close
    Set objTxtFile = Nothing

    b = ""
    Set objRegEx = CreateObject("VBScript.RegExp")
    With objRegEx
        .Global = True
        .MultiLine = True
        .Pattern = "\[([^]]+)\]"
        Set objMatches = .Execute(strText)

        For Each Match In objMatches
            For Each SubMatch In Match.SubMatches
                b = b & SubMatch & vbCrLf
            Next SubMatch
        Next Match

    End With
    Set objRegEx = Nothing

    Set objTxtFile = objFSO.OpenTextFile(a, 2)
    objTxtFile.Write b
    objTxtFile.Close
    Set objTxtFile = Nothing
End Sub


EDIT - Based on @PortlandRunner's suggestion

Public Sub testa()
a = "C:\Users\pankaj.jaju\Desktop\test.log"

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTxtFile = objFSO.OpenTextFile(a, 1)
strText = objTxtFile.ReadAll
objTxtFile.Close
Set objTxtFile = Nothing

b = ""
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
    .Global = True
    .MultiLine = True
    .Pattern = "(.*?)(\[)([^]]+)(\])(.*?)"
    b = objRegEx.Replace(strText, "$3")
End With
Set objRegEx = Nothing

Set objTxtFile = objFSO.OpenTextFile(a, 2)
objTxtFile.Write b
objTxtFile.Close
Set objTxtFile = Nothing
End Sub
Pankaj Jaju
  • 5,371
  • 2
  • 25
  • 41