30

I am trying to remove all extra blank rows and columns from an excel file using Interop Library.

I followed this question Fastest method to remove Empty rows and Columns From Excel Files using Interop and i find it helpful.

But i have excel files that contains a small set of data but a lot of empty rows and columns (from the last non empty row (or column) to the end of the worksheet)

I tried looping over Rows and Columns but the loop is taking hours.

I am trying to get the last non-empty row and column index so i can delete the whole empty range in one line

XlWks.Range("...").EntireRow.Delete(xlShiftUp)

enter image description here

Note: i am trying to get the last row containing data to remove all extra blanks (after this row , or column)

Any suggestions?


Note: The code must be compatible with SSIS Script Task environment

Yahfoufi
  • 2,220
  • 1
  • 22
  • 41
  • Are you trying to delete row 17 upwards, or row 7,8,13? – Phil May 13 '17 at 14:18
  • I just tried creating a worksheet with 10000 rows, every other row was empty. Deleting the 5000 empty rows took 38 seconds. – Phil May 13 '17 at 14:42
  • @Phil i am trying to get the last row containing data to remove all extra blanks (after this row , or column) – Yahfoufi May 15 '17 at 07:19
  • 3
    Well I am confused now. What should be deleted? Please edit the question and add image which will show the expected result. Should rows 7,8,13 be deleted? And should columns `H` and `E` be deleted? – Daniel Dušek Jul 20 '18 at 11:46
  • @dee i want to delete all rows after `17` and all columns after `J` for the other blanks rows (7,8,13) and columns (E,H) the answer in the provided link it working fine. – Yahfoufi Jul 23 '18 at 10:22
  • @dee these rows are empty but they are reserved in excel, so we need to delete them. if you import the file to other sources sometimes you will have more than 100000 empty rows – Yahfoufi Jul 23 '18 at 10:24
  • 1
    @Yahfoufi i have edited the answer, please have a look. – Daniel Dušek Jul 23 '18 at 14:49
  • 1
    "if you import the file to other sources sometimes you will have more than 100000 empty rows" - So is the real question: How to _sanitize_ a worksheet so that its `UsedRange` property only includes data (text or numbers)? Can rows and columns within this range be blank? – TnTinMn Jul 25 '18 at 01:36
  • @TnTinMn many times `UsedRange` return extra empty rows and columns. – Yahfoufi Jul 25 '18 at 12:53

8 Answers8

13

Update 1

If your goal is to import the excel data using c#, assuming that you have identified the the highest used index in your worksheet (in the image you posted it is Col = 10 , Row = 16), you can convert the maximum used indexes to letter so it will be J16 and select only the used range using and OLEDBCommand

SELECT * FROM [Sheet1$A1:J16]

Else, i don't think it is easy to find a faster method.

You can refer to these article to convert indexes into alphabet and to connect to excel using OLEDB:


Initial Answer

As you said you started from the following question:

And you are trying to "get the last row containing data to remove all extra blanks (after this row , or column)"

So assuming that you are working with the accept answer (provided by @JohnG), so you can add some line of code to get the last used row and column

Empty Rows are stored in a list of integer rowsToDelete

You can use the following code to get the last non empty rows with an index smaller than the last empty row

List<int> NonEmptyRows = Enumerable.Range(1, rowsToDelete.Max()).ToList().Except(rowsToDelete).ToList();

And if NonEmptyRows.Max() < rowsToDelete.Max() the last non-empty row is NonEmptyRows.Max() Else it is worksheet.Rows.Count and there is no empty rows after the last used one.

The same thing can be done to get the last non empty column

The code is Edited in DeleteCols and DeleteRows functions:

    private static void DeleteRows(List<int> rowsToDelete, Microsoft.Office.Interop.Excel.Worksheet worksheet)
    {
        // the rows are sorted high to low - so index's wont shift

        List<int> NonEmptyRows = Enumerable.Range(1, rowsToDelete.Max()).ToList().Except(rowsToDelete).ToList();

        if (NonEmptyRows.Max() < rowsToDelete.Max())
        {

            // there are empty rows after the last non empty row

            Microsoft.Office.Interop.Excel.Range cell1 = worksheet.Cells[NonEmptyRows.Max() + 1,1];
            Microsoft.Office.Interop.Excel.Range cell2 = worksheet.Cells[rowsToDelete.Max(), 1];

            //Delete all empty rows after the last used row
            worksheet.Range[cell1, cell2].EntireRow.Delete(Microsoft.Office.Interop.Excel.XlDeleteShiftDirection.xlShiftUp);


        }    //else last non empty row = worksheet.Rows.Count



        foreach (int rowIndex in rowsToDelete.Where(x => x < NonEmptyRows.Max()))
        {
            worksheet.Rows[rowIndex].Delete();
        }
    }

    private static void DeleteCols(List<int> colsToDelete, Microsoft.Office.Interop.Excel.Worksheet worksheet)
    {
        // the cols are sorted high to low - so index's wont shift

        //Get non Empty Cols
        List<int> NonEmptyCols = Enumerable.Range(1, colsToDelete.Max()).ToList().Except(colsToDelete).ToList();

        if (NonEmptyCols.Max() < colsToDelete.Max())
        {

            // there are empty rows after the last non empty row

            Microsoft.Office.Interop.Excel.Range cell1 = worksheet.Cells[1,NonEmptyCols.Max() + 1];
            Microsoft.Office.Interop.Excel.Range cell2 = worksheet.Cells[1,NonEmptyCols.Max()];

            //Delete all empty rows after the last used row
            worksheet.Range[cell1, cell2].EntireColumn.Delete(Microsoft.Office.Interop.Excel.XlDeleteShiftDirection.xlShiftToLeft);


        }            //else last non empty column = worksheet.Columns.Count

        foreach (int colIndex in colsToDelete.Where(x => x < NonEmptyCols.Max()))
        {
            worksheet.Columns[colIndex].Delete();
        }
    }
Hadi
  • 36,233
  • 13
  • 65
  • 124
10

Several years ago I created a MSDN code sample that permits a developer to get the last used row and column from a worksheet. I modified it, placed all needed code into a class library with a windows form front end to demo the operation.

Underlying code uses Microsoft.Office.Interop.Excel.

Location on Microsoft one drive https://1drv.ms/u/s!AtGAgKKpqdWjiEGdBzWDCSCZAMaM

Here I get the first sheet in an Excel file, get the last used row and col and present as a valid cell address.

Private Sub cmdAddress1_Click(sender As Object, e As EventArgs) Handles cmdAddress1.Click
    Dim ops As New GetExcelColumnLastRowInformation
    Dim info = New UsedInformation
    ExcelInformationData = info.UsedInformation(FileName, ops.GetSheets(FileName))

    Dim SheetName As String = ExcelInformationData.FirstOrDefault.SheetName

    Dim cellAddress = (
        From item In ExcelInformationData
        Where item.SheetName = ExcelInformationData.FirstOrDefault.SheetName
        Select item.LastCell).FirstOrDefault

    MessageBox.Show($"{SheetName} - {cellAddress}")

End Sub

Within the demo project I also get all sheets for an excel file, present them in a ListBox. Select a sheet name from the list box and get that sheet's last row and column in a valid cell address.

Private Sub cmdAddress_Click(sender As Object, e As EventArgs) Handles cmdAddress.Click
    Dim cellAddress =
        (
            From item In ExcelInformationData
            Where item.SheetName = ListBox1.Text
            Select item.LastCell).FirstOrDefault

    If cellAddress IsNot Nothing Then
        MessageBox.Show($"{ListBox1.Text} {cellAddress}")
    End If

End Sub

Upon first glance when opening the solution from the link above you will note there is a lot of code. The code is optimal and will release all objects immediately.

Karen Payne
  • 4,341
  • 2
  • 14
  • 31
  • It would be really handy if the library code was in this answer or in GitHub or CodeProject as most workplaces dont allow downloading from OneDrive or etc – Jeremy Thompson May 17 '17 at 00:44
  • Didn't know that. I just pushed my solution to GitHub. https://github.com/karenpayneoregon/excel-usedrowscolumns – Karen Payne May 17 '17 at 02:02
  • Nice one, good code, no double dots and good COM cleanup. One suggestion is the Extension Method `ToDataTable`its slow using Reflection, you will be amazed at the performance improvements if you use [FastMember](https://github.com/mgravell/fast-member) like this: `Dim data As IEnumerable(Of AccountInfo) = Accounts.GetAccounts(False) Using reader = FastMember.ObjectReader.Create(data, properties) dt.Load(reader) End Using` – Jeremy Thompson May 17 '17 at 03:26
  • 1
    @JeremyThompson thanks for the review. In regards to ToDataTable, I agree. I originally wrote this many years ago and never updated it. I'm going to update it this weekend (when I have time). – Karen Payne May 17 '17 at 10:07
7

I'm using ClosedXml which has useful 'LastUsedRow' and 'LastUsedColumn' methods.

var wb = new XLWorkbook(@"<path>\test.xlsx", XLEventTracking.Disabled);
var sheet = wb.Worksheet("Sheet1");

for (int i = sheet.LastRowUsed().RowNumber() - 1; i >= 1; i--)
{
    var row = sheet.Row(i);
    if (row.IsEmpty())
    {
        row.Delete();
    }
}

wb.Save();

This simple loop deleted 5000 out of 10000 rows in 38 seconds. Not fast, but a lot better than 'hours'. That depends on how many rows/columns you're dealing with of course which you don't say. However, after further tests with 25000 empty rows out of 50000 it does take about 30 minutes to delete the empty rows in a loop. Clearly deleting rows isn't an efficient process.

A better solution is to create a new sheet and then copy the rows you want to keep.

Step 1 - create sheet with 50000 rows and 20 columns, every other row and column is empty.

var wb = new XLWorkbook(@"C:\Users\passp\Documents\test.xlsx");
var sheet = wb.Worksheet("Sheet1");
sheet.Clear();

for (int i = 1; i < 50000; i+=2)
{
    var row = sheet.Row(i);

    for (int j = 1; j < 20; j += 2)
    {
        row.Cell(j).Value = i * j;
    }
}

Step 2 - copy the rows with data to a new sheet. This takes 10 seconds.

var wb = new XLWorkbook(@"C:\Users\passp\Documents\test.xlsx", XLEventTracking.Disabled);
var sheet = wb.Worksheet("Sheet1");

var sheet2 = wb.Worksheet("Sheet2");
sheet2.Clear();

sheet.RowsUsed()
    .Where(r => !r.IsEmpty())
    .Select((r, index) => new { Row = r, Index = index + 1} )
    .ForEach(r =>
    {
        var newRow = sheet2.Row(r.Index);

        r.Row.CopyTo(newRow);
    }
);

wb.Save();

Step 3 - this would be to do the same operation for the columns.

Phil
  • 42,255
  • 9
  • 100
  • 100
7
  • To get last non empty column/row index the Excel function Find can be used. See GetLastIndexOfNonEmptyCell.
  • Then the Excel Worksheet Function CountA is used to determine if the cells are empty and union the entire rows/columns to one rows/columns range.
  • This ranges are deleted finally at once.

public void Yahfoufi(string excelFile)
{
    var exapp = new Microsoft.Office.Interop.Excel.Application {Visible = true};
    var wrb = exapp.Workbooks.Open(excelFile);
    var sh = wrb.Sheets["Sheet1"];
    var lastRow = GetLastIndexOfNonEmptyCell(exapp, sh, XlSearchOrder.xlByRows);
    var lastCol = GetLastIndexOfNonEmptyCell(exapp, sh, XlSearchOrder.xlByColumns);
    var target = sh.Range[sh.Range["A1"], sh.Cells[lastRow, lastCol]];
    Range deleteRows = GetEmptyRows(exapp, target);
    Range deleteColumns = GetEmptyColumns(exapp, target);
    deleteColumns?.Delete();
    deleteRows?.Delete();
}

private static int GetLastIndexOfNonEmptyCell(
    Microsoft.Office.Interop.Excel.Application app,
    Worksheet sheet,
    XlSearchOrder searchOrder)
{
    Range rng = sheet.Cells.Find(
        What: "*",
        After: sheet.Range["A1"],
        LookIn: XlFindLookIn.xlFormulas,
        LookAt: XlLookAt.xlPart,
        SearchOrder: searchOrder,
        SearchDirection: XlSearchDirection.xlPrevious,
        MatchCase: false);
    if (rng == null)
        return 1;
    return searchOrder == XlSearchOrder.xlByRows
        ? rng.Row
        : rng.Column;
}

private static Range GetEmptyRows(
    Microsoft.Office.Interop.Excel.Application app,
    Range target)
{
    Range result = null;
    foreach (Range r in target.Rows)
    {
        if (app.WorksheetFunction.CountA(r.Cells) >= 1)
            continue;
        result = result == null
            ? r.EntireRow
            : app.Union(result, r.EntireRow);
    }
    return result;
}

private static Range GetEmptyColumns(
    Microsoft.Office.Interop.Excel.Application app,
    Range target)
{
    Range result = null;
    foreach (Range c in target.Columns)
    {
        if (app.WorksheetFunction.CountA(c.Cells) >= 1)
            continue;
        result = result == null
            ? c.EntireColumn
            : app.Union(result, c.EntireColumn);
    }
    return result;
}

The two functions for getting empty ranges of rows/columns could be refactored to one function, something like this:

private static Range GetEntireEmptyRowsOrColumns(
    Microsoft.Office.Interop.Excel.Application app,
    Range target,
    Func<Range, Range> rowsOrColumns,
    Func<Range, Range> entireRowOrColumn)
{
    Range result = null;
    foreach (Range c in rowsOrColumns(target))
    {
        if (app.WorksheetFunction.CountA(c.Cells) >= 1)
            continue;
        result = result == null
            ? entireRowOrColumn(c)
            : app.Union(result, entireRowOrColumn(c));
    }
    return result;
}

And then just call it:

Range deleteColumns = GetEntireEmptyRowsOrColumns(exapp, target, (Func<Range, Range>)(r1 => r1.Columns), (Func<Range, Range>)(r2 => r2.EntireColumn));
Range deleteRows = GetEntireEmptyRowsOrColumns(exapp, target, (Func<Range, Range>)(r1 => r1.Rows), (Func<Range, Range>)(r2 => r2.EntireRow));
deleteColumns?.Delete();
deleteRows?.Delete();

Note: for more informations have a look e.g. on this SO question.

Edit

Try to simply clear the content of all the cells which are after the last used cell.

public void Yahfoufi(string excelFile)
{
    var exapp = new Microsoft.Office.Interop.Excel.Application {Visible = true};
    var wrb = exapp.Workbooks.Open(excelFile);
    var sh = wrb.Sheets["Sheet1"];
    var lastRow = GetLastIndexOfNonEmptyCell(exapp, sh, XlSearchOrder.xlByRows);
    var lastCol = GetLastIndexOfNonEmptyCell(exapp, sh, XlSearchOrder.xlByColumns);

    // Clear the columns
    sh.Range(sh.Cells(1, lastCol + 1), sh.Cells(1, Columns.Count)).EntireColumn.Clear();

    // Clear the remaining cells
    sh.Range(sh.Cells(lastRow + 1, 1), sh.Cells(Rows.Count, lastCol)).Clear();

}
Daniel Dušek
  • 13,683
  • 5
  • 36
  • 51
4

Let's say the last corner cell with data is J16 - so no data in columns K onwards, or in rows 17 downwards. Why are you actually deleting them? What is the scenario and what are you trying to achieve? Is it clearing our formatting? Is is clearing our formulas which show an empty string?

In any case, looping is not the way.

The code below shows a way to use the Clear() method of Range object to clear all contents and formulas and formatting from a range. Alternatively if you really want to delete them, you can use the Delete() method to delete a whole rectangular Range in one hit. Will be much faster than looping...

//code uses variables declared appropriately as Excel.Range & Excel.Worksheet Using Interop library
int x;
int y;
// get the row of the last value content row-wise
oRange = oSheet.Cells.Find(What: "*", 
                           After: oSheet.get_Range("A1"),
                           LookIn: XlFindLookIn.xlValues,
                           LookAt: XlLookAt.xlPart, 
                           SearchDirection: XlSearchDirection.xlPrevious,
                           SearchOrder: XlSearchOrder.xlByRows);

if (oRange == null)
{
    return;
}
x = oRange.Row;

// get the column of the last value content column-wise
oRange = oSheet.Cells.Find(What: "*",
                           After: oSheet.get_Range("A1"),
                           LookIn: XlFindLookIn.xlValues, LookAt: XlLookAt.xlPart,
                           SearchDirection: XlSearchDirection.xlPrevious,
                           SearchOrder: XlSearchOrder.xlByColumns);
y = oRange.Column;

// now we have the corner (x, y), we can delete or clear all content to the right and below
// say J16 is the cell, so x = 16, and j=10

Excel.Range clearRange;

//set clearRange to ("K1:XFD1048576")
clearRange = oSheet.Range[oSheet.Cells[1, y + 1], oSheet.Cells[oSheet.Rows.Count, oSheet.Columns.Count]];
clearRange.Clear(); //clears all content, formulas and formatting
//clearRange.Delete(); if you REALLY want to hard delete the rows

//set clearRange to ("A17:J1048576")            
clearRange = oSheet.Range[oSheet.Cells[x + 1, 1], oSheet.Cells[oSheet.Rows.Count, y]];
clearRange.Clear(); //clears all content, formulas and formatting
//clearRange.Delete();  if you REALLY want to hard delete the columns
MacroMarc
  • 3,214
  • 2
  • 11
  • 20
2

You should be able to find the last non-empty row and column with something similar to this:

with m_XlWrkSheet
lastRow = .UsedRange.Rows.Count
lastCol = .UsedRange.Columns.Count
end with

That's VB.NET, but it should more or less work. That will return Row 16 and Column 10 (based on your picture above). Then you can use that to find the range you want to delete all in one line.

garroad_ran
  • 174
  • 2
  • 11
1

Seems that your problem has been resolved by Microsoft. Take a look at Range.CurrentRegion Property, which returns a range bounded by any combination of blank rows and blank columns. There's one inconvenience: this property cannot be used on a protected worksheet.

For further details, please see: How to Find Current Region, Used Range, Last Row and Last Column in Excel with VBA Macro

Some of SO members have mentioned about UsedRange property, which might be useful too, but the differ to CurrentRegion is that UsedRange returns a range includes any cell that has ever been used.
So, if you would like to get a LAST(row) and LAST(column) occupied by data, you have to use End property with XlDirection: xlToLeft and/or xlUp.

Note #1:
If your data are in a tabular format, you can simply find last cell, by using:

lastCell = yourWorkseet.UsedRange.End(xlUp)
firstEmtyRow = lastCell.Offset(RowOffset:=1).EntireRow

Note #2:
If your data aren't in a tabular format, you need to loop through the collection of rows and columns to find last non-blank cell.

Good luck!

Maciej Los
  • 8,468
  • 1
  • 20
  • 35
  • Thanks for the useful information. But int the use case i am working on it doesn't work, because it is not a well formatted tabular format – Yahfoufi Jul 27 '18 at 10:29
1

I think you can try using the Range.

Application excel = new Application();
Workbook workBook=  excel.Workbooks.Open("file.xlsx")
Worksheet excelSheet = workBook.ActiveSheet;
Range excelRange = excelSheet.UsedRange.Columns[1, Missing.Value] as Range;

var lastNonEmptyRow = excelRange.Cells.Count;

The above code works for me.

Yahfoufi
  • 2,220
  • 1
  • 22
  • 41
Shajin Chandran
  • 1,500
  • 1
  • 11
  • 12