1

What is the fastest way to remove characters for each cell? I have 300k rows, looping every single cell is not ideal. I tried text to column but i would need to know how many symbols there could be within each cell. Is there a better way to do this?

Here's what I have looping 300,000 cell. It takes way too much time.

For Each destineCell In destinRange
    cellVal = destineCell.Value
    If (InStr(cellVal, ", ")) Then
        removed = Left(cellVal, InStr(cellVal, ", ") - 1)
        ActiveCell.Value = removed & " "
    End If
    ActiveCell.Offset(1, 0).Select
Next destineCell

[[Update]] Here's what a sample data would look like..

New York, NY, USA
Rome, Italy - Tier 1 City

the data itself varies. Sometimes it'll have a comma, a dash or both.

brettdj
  • 54,857
  • 16
  • 114
  • 177
hammies
  • 1,344
  • 2
  • 22
  • 46
  • So you have a comma at the end that you want to get rid of in certain cells? Are there commas anywhere else in the string or just one? – Brian Aug 10 '16 at 14:23
  • 1
    Just use the [Range.Replace method](https://msdn.microsoft.com/en-us/library/office/ff194086.aspx) to remove what you want all at once. It can even be done in a single line of code: `destinRange.Replace(", ", " ")` – tigeravatar Aug 10 '16 at 14:24
  • Use an array instead of repeatedly writing to cells. See [this answer](http://stackoverflow.com/a/38797051/4088852) for some examples. Also `ActiveCell.Offset(1, 0).Select` and `ActiveCell.Value = removed & " "` are hands down the slowest thing in your loop. – Comintern Aug 10 '16 at 14:26
  • or `join(split(cell.value,", ")," ")` – Nathan_Sav Aug 10 '16 at 14:26
  • @Brian I updated a sample data what it'd look like – hammies Aug 10 '16 at 14:29
  • @pnuts basically remove everything after the comma or - – hammies Aug 10 '16 at 14:29
  • You can copy the range into `Array`, iterate over array replacing the values and then paste the array back. Thus you'll limit Excel recalc calls -- it should be light years faster on 300k, even faster than `Range.Replace`. – Logan Reed Aug 10 '16 at 14:32
  • @pnuts after the first comma/dash – hammies Aug 10 '16 at 14:36
  • @Comintern yea definitely slow. let me try array as you and logan suggested. – hammies Aug 10 '16 at 14:39
  • 3
    @Nathan_Sav I am not sure `join/split` is faster than VBA's built-in `Replace`. Your approach would only be beneficial if you do something else with the split array, otherwise it is suboptimal. – Logan Reed Aug 10 '16 at 14:55
  • 1
    @pnuts has asked some crucial questions. The clue is there :) – Siddharth Rout Aug 10 '16 at 15:17

5 Answers5

4

What is the fastest way to remove characters for each cell? I have 300k rows, looping every single cell is not ideal. I tried text to column but i would need to know how many symbols there could be within each cell. Is there a better way to do this?

@pnuts basically remove everything after the comma or - – JamAndJammies 42 mins ago

@pnuts after the first comma/dash – JamAndJammies 37 mins ago

Based on your question and what you said in the comment, the most simplest and fastest way which will not use VBA is

  1. Set a single letter delimiter which you feel will not be in the data. Let's say it is |. Or you can choose some other special character? Let's call this keyword.
  2. Press Ctrl + H to bring up find and replace dialog box and do a replace. Find , and replace it with the keyword. Similarly replace - with the keyword.
  3. Do Text To columns and split it on the keyword.
  4. Keep the 1st column and delete the rest of the columns
Community
  • 1
  • 1
Siddharth Rout
  • 147,039
  • 17
  • 206
  • 250
1

You can copy the range into Array, iterate over array replacing the values and then paste the array back. Thus you'll limit Excel recalc calls -- it should be light years faster on 300k, even faster than Range.Replace. Here is an example:

Sub fastReplace()

    ' range to run the replace operation on
    Dim rngRepl As Range
    Set rngRepl = ActiveSheet.[a1:a4]

    ' read range into array (it is a 2D array)
    Dim arr()
    arr = rngRepl

    ' do the replace
    Dim i As Long
    Dim j As Long
    For i = LBound(arr, 1) To UBound(arr, 1)
        For j = LBound(arr, 2) To UBound(arr, 2)
            arr(i, j) = Replace(arr(i, j), "o", "a")
        Next j
    Next i

    ' paste array back
    rngRepl = arr

End Sub
Logan Reed
  • 882
  • 7
  • 13
  • It's slightly faster but how do i remove everything after the characters? replace doesn't work – hammies Aug 10 '16 at 15:06
  • You'd use `Left` just like you did in your original example, but write it back into array, not the cell. If it is still not fast enough, your next option is to write it in C and reference a DLL... – Logan Reed Aug 10 '16 at 15:12
  • yes you are right! it was super fast (blink of an eye). thank you! – hammies Aug 10 '16 at 15:16
  • @JamAndJammies Yep, "You've been holding it wrong!" - Steve Jobs. – Logan Reed Aug 10 '16 at 15:27
1

Both methods actually take a similar time for 300,000 data points, around 0.5 seconds on my system.

I did update the array code to handle for the actual replacement required, and more importantly to test for the presence of the string to be replaced (else the code fails).

I didn't play with the extra columns in the other code, I would probably use a copy of the sheet before using this method so that all other columns could be deleted without impacting existing other data.

Code to time approaches

Sub Overall()
Dim dbTime As Double
Dim lngCalc As Long

With Application
 .ScreenUpdating = False
 lngCalc = .Calculation
 .EnableEvents = False
End With

dbTime = Timer()
Call fastReplace
Debug.Print Timer() - dbTime
dbTime = Timer()
Call SD
Debug.Print Timer() - dbTime

With Application
 .ScreenUpdating = True
 .Calculation = lngCalc
 .EnableEvents = True
End With

End Sub

Tidied up array

 Sub fastReplace()

    ' range to run the replace operation on
    Dim rngRepl As Range
    Dim arr()

    Set rngRepl = ActiveSheet.[a1:a300000]

    arr = rngRepl

    ' do the replace
    Dim i As Long
    Dim j As Long
    For i = LBound(arr, 1) To UBound(arr, 1)
        For j = LBound(arr, 2) To UBound(arr, 2)
            If InStr(arr(i, j), ", ") > 0 Then arr(i, j) = Left$(arr(i, j), InStr(arr(i, j), ", ") - 1)
        Next j
    Next i

    ' paste array back
    rngRepl.Value2 = arr

End Sub

Text to Columns

Sub SD()
 [a1:a300000].TextToColumns , , , , , , True
End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
  • problem with texttocolumns is then you have to figure out the length within the cells and how many commas and other stuff within the cell... also need to insert columns and whatnot... so many more extra steps. – hammies Aug 11 '16 at 14:23
  • You would just copy it to a blank sheet, run the split, discard the rest. – brettdj Aug 12 '16 at 02:25
1

Or a one-line VBA equivalent of using LEFT on each string:

[a1:A300000] = Application.Evaluate("=IF(ISERR(FIND("","",A1:a300000)),A1:A300000,LEFT(A1:A300000,FIND("","",A1:A300000)-1))")
brettdj
  • 54,857
  • 16
  • 114
  • 177
0

EDIT

Try this and change your range, accordingly:

Option Explicit

Sub FindAndReplace()

Dim Arr() As Variant
Arr = Range("A1:A3")

Dim x As Integer

For x = LBound(Arr) To UBound(Arr)
    Arr(x, 1) = Left(Arr(x, 1), InStr(Arr(x, 1), ",") - 1)
Next x
Range("A1").Resize(UBound(Arr), 1) = Arr

End Sub
Brian
  • 2,078
  • 1
  • 15
  • 28
  • let me try but doesn't this just replace `,` with no space... so it wouldn't remove anything after the comma, no? – hammies Aug 10 '16 at 14:30
  • Yes, it does. It was a little unclear on that point. Please add what you want your examples to look like after replacing. – Brian Aug 10 '16 at 14:31
  • Yea i just tried it, it just replaces to nothing. I want everything to be removed after the symbol. It's slightly faster though. – hammies Aug 10 '16 at 14:35
  • @Brian - why do you do `Range.Resize`? It is exactly the same size since you copied it into the array. Just reuse the range... I've added the example too -- have a look. I also wouldn't assume it is just one column - see my example -- it will work on any range, even multi-column one. – Logan Reed Aug 10 '16 at 14:51
  • @LoganReed I just didn't put `Range("A1:A3")` like I did when I allocated the array. You did via a variable of data type `Range`. Why would the OP try text to column if he needed it for multiple columns? – Brian Aug 10 '16 at 16:12