0

I've a Excel (could be 2010 or 2013 don't know if this can be a problem later) document with four columns. The first three columns store phone numbers which basically are a string with 10 or more characters. The four column only and forever will store 1, 2, 3 or 4 and it's a category. I need to check if each number in column A appears in column B and C or not so I think in read all the Excel cells for each column and store in a list (isn't implemented yet because the problem I'll explain below). For that I make this code:

private void btnCargarExcel_Click(object sender, EventArgs e)
        {
            if (this.openFileDialog1.ShowDialog() == DialogResult.OK)
            {

                if (System.IO.File.Exists(openFileDialog1.FileName))
                {
                    filePath.Text = openFileDialog1.FileName.ToString();

                    Excel.Application xlApp;
                    Excel.Workbook xlWorkBook;
                    Excel.Worksheet xlWorkSheet;
                    Excel.Range range;

                    string str;

                    int rCnt = 0;

                    xlApp = new Microsoft.Office.Interop.Excel.Application();
                    xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true, Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
                    xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

                    range = xlWorkSheet.UsedRange;

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 1] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        bd.Add(cleanString(str, 10));
                    }

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 2] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        bl.Add(cleanString(str, 10));
                    }

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 3] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        cm.Add(cleanString(str, 10));
                    }

                    nrosProcesados.Text = bd.Count().ToString();
                    listBox1.DataSource = bd;

                    noProcesadosBL.Text = bl.Count().ToString();
                    listBox2.DataSource = bl;

                    noProcesadosCM.Text = cm.Count().ToString();
                    listBox3.DataSource = cm;

                    xlWorkBook.Close(true, null, null);
                    xlApp.Quit();

                    releaseObject(xlWorkSheet);
                    releaseObject(xlWorkBook);
                    releaseObject(xlApp);
                }
                else
                {
                    MessageBox.Show("No se pudo abrir el fichero!");
                    System.Runtime.InteropServices.Marshal.ReleaseComObject(appExcel);
                    appExcel = null;
                    System.Windows.Forms.Application.Exit();
                }
            }
        }

So I iterate trough cells in a column and store every number in a list after make some strings changes as you can see in code. The problem here is that column A has 797340 cells, column B has 91617 cells and column C has 95891 cells so if I run the application, load the Excel and wait my PC hang out (even having 12GB of RAM and with Core i3 processor) and I need to open Task Manager and end the task. What is the best solution in order to get what I want (leave only the non repeated numbers) and not to hang out my PC? Will be fine divide things in separately threads (I don't know much about this because I'm starting in C# so any help will be appreciated) for each cycle? What is yours opinion about this topic?

EDIT: Add a new and clean method

So after read and read and get help from some members here I improve the code a bit but now I've another problem (commented below the code). See the code now:

// this goes first when I declare vars
public static System.Array objRowAValues;

// this goes in action when I click the button (I leave only relevant part)
Excel.Application xlApp;
Excel.Workbook xlWorkBook;
Excel.Worksheet xlWorkSheet;
Excel.Range range, rngARowLast;

string str;
int rCnt = 0;

long lastACell, fullRow;

xlApp = new Microsoft.Office.Interop.Excel.Application();
xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true, Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
xlWorkSheet = (Excel.Worksheet) xlWorkBook.Worksheets.get_Item(1);

range = xlWorkSheet.UsedRange;

fullRow = xlWorkSheet.Rows.Count;
lastACell = xlWorkSheet.Cells[fullRow, 1].End(Excel.XlDirection.xlUp).Row;
rngARowLast = xlWorkSheet.get_Range("A1", "A" + lastACell);
objRowAValues = (System.Array) rngARowLast.Cells.Value;

Now because I'll fill a ListBox with values from objRowAValues and ListBox only accepts List as DataSource then I need to convert objRowAValues to a List of string. I try this but itsn't working for me. Any help?

Matt Ellen
  • 11,268
  • 4
  • 68
  • 90
Reynier
  • 2,420
  • 11
  • 51
  • 91
  • Sorry this approach has one utterly flaw: It is slow. To fix this: Load entire columns from Excel - this is magnitudes faster, especially on big datasets. It goes like this thread: http://stackoverflow.com/questions/536636/write-array-to-excel-range If you only have xlsx documents use a third party library like EPPLUS to read data. It is a lot faster. – Christian Sauer Apr 06 '13 at 06:28
  • Store the values from each column in an array directly (without looping though each cell in the column) and then perform the actions on the array. SO has many links which show you how to do this. – Siddharth Rout Apr 06 '13 at 06:43
  • @ChristianSauer thanks but I don't understand how to load entire columns from Excel unless you're talking about the first and accepted reply which I not understand so well so a bit of code will be very helpful for me ;) – Reynier Apr 06 '13 at 14:28
  • @ChristianSauer I've found [this](http://stackoverflow.com/questions/13886330/read-excel-first-column-using-c-sharp-into-array) but don't know how to use in my code because they use `System.Array` and I'm using `List bd = new List();` any help? – Reynier Apr 06 '13 at 18:53
  • No need to be scared of VBA. If this is part of a bigger application, you can kick off the macro in C#. – Robert Co Apr 06 '13 at 21:09
  • @Reynier: Last time I used a two staged approach: First I imported is as a simple array of object, then converted it, item by item, to an array of string - it is still faster than the alternative. – Christian Sauer Apr 07 '13 at 16:58
  • @RobertCo it's a simple application made just for this purpose. So what is your suggestion regarding VBA? – Reynier Apr 07 '13 at 19:32
  • @ChristianSauer ok, good but can you help me writing a small piece of code (as a answer so I can accept) in order to see how did you do in the past? As I said I just know PHP, C# is totally new for me and I don't know yet how to do things – Reynier Apr 07 '13 at 19:33
  • @Reynier: do something along this line: list myList = new list; foreach(object elem in objRowAValues) { myList.add(elem); } This should solve your problem. COuld you accept my anweser please? – Christian Sauer Apr 13 '13 at 15:41
  • @ChristianSauer answer accepted, where I should write this line? At the end? After I create the `System.Array`? – Reynier Apr 13 '13 at 19:06
  • @ChristianSauer I tried what you said and get this error `Error Argument 1: cannot convert from 'object' to 'string'` – Reynier Apr 13 '13 at 21:52
  • ah sorry, you need to get the tostring value from the opject: myList.add(elem.ToString()); Yes, it should be used at the end. – Christian Sauer Apr 14 '13 at 13:35
  • @Reynier no porblem at all! Have fun! – Christian Sauer Apr 14 '13 at 18:42

1 Answers1

1

Unfortunately, I am more of a VB.NET guy - so I have converted some code for you. I hope this works out of the box - I do not use this kind of tool here, so I had no way to test it.

public void test()
{
    object[,] RaWData = null;

    dynamic range = xlWorkSheet.UsedRange;

    //i am unsure here about the correct order - I do not work with excel at Work, so you might have to change the following lange, if columns needs to be before rows or so
    RaWData = range.value2;

    //I am using a list here, because Lists are a lot easier to work with then simple arrays
    List<List<string>> RealData = new List<List<string>>();

    //start at 1  because the excel-delivered array do not have values at index 0 - this is the only 1-based array you will ever encounter in .net
    for (x = 1; x <= Information.UBound(RaWData, 1); x++) {
        List<string> templist = new List<string>();
        for (y = 1; y <= Information.UBound(RaWData, 2); y++) {
            templist.Add(RaWData[x, y].ToString());
        }
        RealData.Add(templist);
    }

    //you should be finished here...
}
Christian Sauer
  • 10,351
  • 10
  • 53
  • 85
  • Thanks @christian-sauer but I don't know how to move the code from VB.NET to C# :-(, did you know how to do the same but using VBA instead? – Reynier Apr 10 '13 at 18:27
  • This is C# - i had it converted for you.... As VBA it makes no sense - because the important part is totally useless in vba. – Christian Sauer Apr 11 '13 at 06:51
  • You're right I didn't read right (**so I have converted some code for you**). Now it's possible to add some way to get the time that C# takes to parse the file? – Reynier Apr 12 '13 at 16:52
  • @chistian-sauer can you take a look at the edit I made earlier to the main post? – Reynier Apr 12 '13 at 22:51