3

I've created a tool in VBA for Excel that analyses .xlsx files we receive based on a number of criteria. One of those is the number of empty cells in the dataset. Unfortunately, though, I have noticed that a number of the files we receive contain cells with zero-length-strings in them which are being "incorrectly" counted as non-empty cells.

It's actually highly beneficial to the whole process if I am just able to remove these from the file.

I have googled this problem a lot, but the only solution I have been able to find so far is to loop through all cells in the sheet (I've also tried just constants, and also using Find to find all zls). This isn't very efficient as the worksheets have large amounts of data in them. EDIT: I have also tried the UsedRange.values = UsedRange.values method, but this removes leading zeroes, which I require.

I have also discovered that this works (-@- is a random string that is highly unlikely to be a single cell in my data, and if it's there, is fine to be removed):

ws.UsedRange.Replace what:=vbNullString, replacement:="-@-", _
                            lookat:=xlWhole, MatchCase:=False
ws.UsedRange.Replace what:="-@-", replacement:="", _
                            lookat:=xlWhole, MatchCase:=False

but if I only do it using the single replace it doesn't:

ws.UsedRange.Replace what:=vbNullString, replacement:="", _
                            lookat:=xlWhole, MatchCase:=False

The first is ok, but has a number of problems:

  1. It takes twice as long as a single replace
  2. It's not always clear if it has crashed, or continued to work
  3. If it does crash, I am left with a number of cells containing -@-, which isn't always obvious, and the tool should be ok to be used by someone who isn't able to understand VBA.

So my questions:

  1. Is there a way of doing this using just a single replace? Why does replace work with the double replace, but not the single?
  2. If the first isn't possible, then is there a way of 'rolling back' the replacement if the code crashes?
  3. Is there a way of updating the status bar to show how far through the replace is to prove the code is running (much like when running replace in Excel itself)?
  4. Or is there just a better way of doing it all?

Thanks in advance!

EDIT: Unfortunately, due to the data being processed, I need to retain formatting, including leading zeroes

EDIT: Here is an example of the sort of data I am looking at. I want to replace the zero-length-strings (which are non-blank cells that have no value in them) so they are a truly blank cell.

Table containing zero-length-strings

Gemma Down
  • 100
  • 10
  • How about you count the cell's containing zero length strings as well, and just add that count to the count of your empty cells? – Luuklag May 07 '19 at 11:15
  • Can we see a data example of those zero lenght values but non blank? – Foxfire And Burns And Burns May 07 '19 at 11:29
  • Thanks @Luuklag, but I'd like to be able to remove them as well as we realised they were actually potentially causing an issue later on in the process. I have also not been able to accurately find a way to do this without looping through all cells. @Foxfire unfortunately, I'm not able to share the files, as they are all confidential ones. However, you can get the same effect where you use something along the lines of ```= ""``` in a cell, then copy and paste values. Essentially, they 'appear' as blank cells, but aren't. – Gemma Down May 07 '19 at 11:34
  • A blank cell is an empty cell. A cell containing `= ""` **is not** a blank cell because it has a formula, even if the result of the formula is a nullstring, the cell itself contains a formula. Values in cells can be obtained from constants (you type them) or formulas. In case of formulas, even if the result is a nullstring, the cell is not blank, because it contains the formula itself. – Foxfire And Burns And Burns May 07 '19 at 11:38
  • @FoxfireAndBurnsAndBurns this is why I'm having the issue. They aren't blank cells, so they aren' being counted as blank cells, but to all intents and purposes they should be. So I'd like to remove them. They all contain zero-length strings rather than being blank or empty. – Gemma Down May 07 '19 at 11:41
  • Ohhh, so do they all those annoying cells contain a formula? Can you just replace with `""` all cells that contain formulas? Or do you have important info with formulas that can't be deleted? – Foxfire And Burns And Burns May 07 '19 at 11:42
  • @FoxfireAndBurnsAndBurns Not as such - they contain a zero-length-string, which is the result of the way they have been exported from the various systems they are exported from. So you click in the formula bar, there appears to be nothing there. But if you do =ISBLANK, they are clearly not blank (or empty). So according to Excel, they are cells containing values (vbNullString), but according to a human, they don't contain values) – Gemma Down May 07 '19 at 11:45
  • Were exactly are they located on your sheets? Are these the bottom rows or rightmost columns? – Luuklag May 07 '19 at 11:47
  • Try converting all cells to values. `ws.UsedRange.Value = ws.UsedRange.Value` or if you have formulas and need to keep them, the same but with `.Formula` – Foxfire And Burns And Burns May 07 '19 at 11:51
  • @Luuklag They are dotted throughout - I've added a screenshot in the original question of the sort of data that I am looking at. – Gemma Down May 07 '19 at 11:54
  • @FoxfireAndBurnsAndBurns Thanks for the suggestion, although I have already tried that. The issue there is I have some cells with leading zeroes, and that removes those, so it doesn't work for me. – Gemma Down May 07 '19 at 11:55
  • Do you have PowerQuery in your Excel-version? (with PQ you can replace empty strings to null, (in one go)) – EvR May 07 '19 at 13:12
  • _I'm not able to share the files, as they are all confidential_ but you can generate one with random data, so it easy to reproduce the problem. There are many ways to do it, but choosing the right one may depend on number of rows/columns/problematic cells or even formatting. – BrakNicku May 07 '19 at 15:45
  • @EvR Unfortunately not. I shall take a look at it though and see if it is something we are able to use - thanks. – Gemma Down May 08 '19 at 06:58
  • @BrakNicku I'd happily recreate a file with the 1m+ bits of data in it, but there is nowhere for me to upload it on here. I may not have enough points to do it, or something. I also have the issue that each data set is different, so whilst something may not be an issue on one of them, it is an issue on another one. – Gemma Down May 08 '19 at 07:00
  • You cannot upload it directly to SO, you have to use any external hosting service and add a link. I know the files possibly differ, but create one that easily shows the problem you are facing - long/crashing replace. – BrakNicku May 08 '19 at 07:12
  • just a FYI, MS has Inquire add-in (Excel 2013+) for analysis, compare and cleanup https://support.office.com/en-us/article/what-you-can-do-with-spreadsheet-inquire-ebaf3d62-2af5-4cb1-af7d-e958cc5fad42 – Slai May 08 '19 at 09:27

3 Answers3

0

You can have cells with formulas that return a null string or cells with null string constants. To clear the second kind of cells:

Sub KillNullConstants()
    Dim cell As Range, Konstants As Range, rng As Range
    Set Konstants = ActiveSheet.UsedRange.Cells.SpecialCells(xlCellTypeConstants)
    Set rng = Nothing
    For Each cell In Konstants
        If Len(cell) = 0 Then
            If rng Is Nothing Then
                Set rng = cell
            Else
                Set rng = Union(rng, cell)
            End If
        End If
    Next cell

    If Not rng Is Nothing Then
        rng.ClearContents
    End If
End Sub

To clear formula cells that are returning null strings, just change the SpecialCells code line.

Gary's Student
  • 95,722
  • 10
  • 59
  • 99
  • Thanks! Although as mentioned in the question, I have come across this before, but don't want to use something that loops through the cells as this is very inefficient for the data I have. – Gemma Down May 07 '19 at 12:13
  • you could limit the range that you search through to 10,000 cell range or something. at end of day you have to look at each cell object value, any tool you use or option you enable or w.e will end up doing this. – learnAsWeGo May 07 '19 at 15:28
  • 1
    @learnAsWeGo Thanks for the note. Although limiting to 10,000 will not clear all the blank cells from my 1m+ cells of data which is my end goal. The method in the question works much faster than the loop (I'm assuming Excel optimised it), which is why I have used it, I just want to see if there is something better than both. Thanks again though. – Gemma Down May 08 '19 at 07:04
0

I know this is a loop as well but maybe a quicker way:

Before:

enter image description here

Run this code on sample data:

Dim X As Double

Option Explicit

Sub Test()

Application.ScreenUpdating = False
Application.Calculation = xlManual

With ActiveWorkbook.Sheets(1).Range("A1:C7")
    For X = 1 To 3
        .AutoFilter Field:=X, Criteria1:=""
        .Columns(X).Offset(1, 0).SpecialCells(xlCellTypeVisible).Clear
    Next X
    .AutoFilter
End With

Application.Calculation = xlAutomatic
Application.ScreenUpdating = True

End Sub

After:

enter image description here

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • Thanks! I shall give it a go. Only have around 30-40 columns, so may be faster than looping through all individual cells. – Gemma Down May 08 '19 at 07:10
  • If your sheet has automatic calculations you might want to include `Application.Calculation = xlManual` @GemmaDown. I'll edit the code. – JvdV May 08 '19 at 07:53
  • 1
    This appears to be working. Although I have updated it to reflect UsedRange (I know this can be risking, but in this case, I have code that removes all unused rows/columns in UsedRange already). – Gemma Down May 08 '19 at 09:09
0

Reading the sheet into an ADO Recordset then copying the Recordset on to a new sheet seems to fix this issue. Try this VBA code:

Sub copy_data()

Dim cn As Object
Set cn = CreateObject("ADODB.Connection")

With cn
    .Provider = "Microsoft.ACE.OLEDB.12.0"
    .ConnectionString = "Data Source=" & ThisWorkbook.FullName & ";" & _
        "Extended Properties=""Excel 12.0 Macro;IMEX=1;HDR=YES"";"
    .Open
End With

Dim rs As Object
Set rs = CreateObject("ADODB.Recordset")

rs.Open "SELECT * FROM [Sheet1$];", cn

Dim i As Integer
Dim fld As Object

With ThisWorkbook.Worksheets("Sheet2")
    .UsedRange.ClearContents

    i = 0
    For Each fld In rs.Fields
        i = i + 1
        .Cells(1, i).Value = fld.Name
    Next fld

    .Cells(2, 1).CopyFromRecordset rs
    .UsedRange.Columns.AutoFit
End With

rs.Close
cn.Close

End Sub

Notes:

  • change the sheet names in the code to match the sheet names you are using. You have to put a $ sign after the sheet name you use in rs.Open. If your sheet was called "Data" then you would put rs.Open "SELECT * FROM [Data$];", cn
  • in the connection string extended properties, HDR=YES was used to specify that the data has headers. ADO may change some of the column names if they contain certain characters - mainly the . character which will normally be replaced by the # character

Potential issues:

  • your data needs to be roughly in a table format - i.e. column names in row 1 with data values below each column name
  • any formulas in the data will be converted to values
  • cell formatting will not be copied over - but numbers formatted as text will remain as text so leading zeroes will be preserved
  • any cell value text longer than 255 characters is likely to be truncated to 255 characters (it is possible to work around this if needed)

Depending on your data, using ADO may create more issues than it solves

barrowc
  • 10,444
  • 1
  • 40
  • 53
  • 1
    Thanks! It looks like a really interesting approach, but unfortunately will not work here as I need to preserve the cell formatting of dates, numbers etc. Although I definitely appreciate that it doesn't lose the leading zeroes, and I will remember this for future cases where the formatting isn't important. – Gemma Down May 08 '19 at 07:05