I have a big .csv file (~600k lines, 56Mo), and inside there is database lines (on each line, there's an id, a client name, a client address, a client birthday date, etc). The problem is that, on some lines, some data is written badly (commas not supposed to be there, that mess up the columns).
I guessed that I had to do some RegEx to detect the problematic lines, and to replace the wrong commas with a dash or anything. I followed this article, and, after some tries, I got him to detect the messed-up lines.
Private Sub simpleRegex()
Dim strPattern As String: strPattern = "[^a-zA-Z0-9_,\-]([A-z]+)\,[^a-zA-Z0-9_,\-]([A-z]+)"
Dim strReplace As String: strReplace = "[^a-zA-Z0-9_,\-][A-z]+\-[^a-zA-Z0-9_,\-][A-z]"
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("A1:A2000")
For Each cell In Myrange
If strPattern <> "" Then
strInput = cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
End If
End If
Next End Sub
The problem is, this solution works if I wanted to change the cibled lines with an unique value, a predefined string (like strReplace="replacement words"). What I want, is to be able to target a succession of characters that match my pattern, and to replace only one character (the comma) in it. An example of it would be :
728,"HAY,HAYE",Marie,François,RAUTUREAU,85,29/05/1856,68;
into :
728,"HAY-HAYE",Marie,François,RAUTUREAU,85,29/05/1856,68;
Do you have a solution?
(Sorry if bad english, it's not my mother tongue).