6

I have a Winform project, created on Microsoft Framework 3.5. The users may have installed Windows 7 or Windows XP, and Office 2007 or above.

I'm working on in a procedure to get the clipboard data and put in on a C# DataTable. I already created a method to get the raw data from the clipboard and upload it in a DataTable.

But in some cases, the Excel data shows a value, but internally have another:

enter image description here

I'm investigating a method to get the raw data from Excel:

string XmlFmt = "XML Spreadsheet";
var clipboard = Clipboard.GetDataObject();

if (clipboard.GetDataPresent(XmlFmt))
{
    var clipData = clipboard.GetData(XmlFmt);
    StreamReader streamReader = new StreamReader((MemoryStream)clipData);
    streamReader.BaseStream.SetLength(streamReader.BaseStream.Length - 1);

    string xmlText = streamReader.ReadToEnd();
    var stream = new StringReader(xmlText);

    XmlDocument xmlDocument = new XmlDocument();
    xmlDocument.LoadXml(xmlText);

    DataSet dsExcelData = new DataSet();
    dsExcelData.ReadXml(new XmlNodeReader(xmlDocument));
}

But, this method retrieves me a DataSet with multiples tables with the configuration of each part of the Excel Data: enter image description here enter image description here

Basically, I want to convert these structures to a simple DataTable with only the raw data. Someone could help me with a hint how achieve this? ...I don't want to use a third party library in this implementation.

MiBol
  • 1,985
  • 10
  • 37
  • 64

3 Answers3

7

I found a clean and bullet-proof solution. Here the code:

First, a extension to convert a XmlDocument to XElement:

/// <summary> Convert XML Document to XDocument </summary>
/// <param name="xmlDocument">Attached XML Document</param>
public static XDocument fwToXDocument(this XmlDocument xmlDocument)
{
    using (XmlNodeReader xmlNodeReader = new XmlNodeReader(xmlDocument))
    {
        xmlNodeReader.MoveToContent();
        return XDocument.Load(xmlNodeReader);
    }
}

The complete function:

private DataTable clipboardExcelToDataTable(bool blnFirstRowHasHeader = false)
{
    string strTime = "S " + DateTime.Now.ToString("mm:ss:fff");
    var clipboard = Clipboard.GetDataObject();
    if (!clipboard.GetDataPresent("XML Spreadsheet")) return null;

    strTime += "\r\nRead " + DateTime.Now.ToString("mm:ss:fff");
    StreamReader streamReader = new StreamReader((MemoryStream)clipboard.GetData("XML Spreadsheet"));
    strTime += "\r\nFinish read " + DateTime.Now.ToString("mm:ss:fff");
    streamReader.BaseStream.SetLength(streamReader.BaseStream.Length - 1);

    XmlDocument xmlDocument = new XmlDocument();
    xmlDocument.LoadXml(streamReader.ReadToEnd());
    strTime += "\r\nRead XML Document " + DateTime.Now.ToString("mm:ss:fff");

    XNamespace ssNs = "urn:schemas-microsoft-com:office:spreadsheet";
    DataTable dtData = new DataTable();

    var linqRows = xmlDocument.fwToXDocument().Descendants(ssNs + "Row").ToList<XElement>();

    for (int x = 0; x < linqRows.Max(a => a.Descendants(ssNs + "Cell").Count()); x++)
        dtData.Columns.Add("Column " + (x + 1).ToString());

    int intCol = 0;

    DataRow drCurrent;

    linqRows.ForEach(rowElement =>
        {
            intCol = 0;
            drCurrent = dtData.Rows.Add();
            rowElement.Descendants(ssNs + "Cell")
                .ToList<XElement>()
                .ForEach(cell => drCurrent[intCol++] = cell.Value);
        });

    if (blnFirstRowHasHeader)
    {
        int x = 0;
        foreach (DataColumn dcCurrent in dtData.Columns)
            dcCurrent.ColumnName = dtData.Rows[0][x++].ToString();

        dtData.Rows.RemoveAt(0);
    }

    strTime += "\r\nF " + DateTime.Now.ToString("mm:ss:fff");

    return dtData;
}

The process takes ~15 seconds to read ~25,000 rows.

Works perfectly for any kind of data. Basically, the method creates a grid with the same structure of the Excel WorkSheet. Merge of rows or columns will fill up the first cell able. All columns will be string DataType by default.

MiBol
  • 1,985
  • 10
  • 37
  • 64
  • I'm looking to use this code - is "Column {0}".fwFormat(x + 1)" a fixed width extension method on String? What does it do? – MrTelly Mar 19 '18 at 23:50
  • @MrTelly; my bad. Use this instead `dtData.Columns.Add("Column " + (x + 1).ToString());` – MiBol Mar 22 '18 at 14:30
  • 1
    there is a bug if there is blank cell. fortunally there is a attribute which give the current index (ss:Index give the column base 1). please replace : .ForEach(cell => drCurrent[intCol++] = cell.Value); with .ForEach(cell => { var newIndexStr = cell.Attribute(ssNs + "Index")?.Value; if (newIndexStr != null) { intCol = int.Parse(newIndexStr)-1; } drCurrent[intCol++] = cell.Value; }); – P. Sohm Jan 17 '23 at 18:08
5

If they are flat data you could do it like this.

private class Field
{
        public string Valor { get; set; }
}

private class Row
{
        public List<Field> Fields { get; set; }

        public Row(string value)
        {
            Fields = new List<Field>();
            var fieldsString = value.Split(new char[] {'\t'});
            foreach (string f in fieldsString)
            {
                Fields.Add(new Field {Valor = f});
            }
    }
}

public Parse()
{
    var data = Clipboard.GetDataObject();
    var datos = (string)data.GetData(DataFormats.Text);
    var stringRows = datos.Split(new Char[] {'\r', '\n'}, StringSplitOptions.RemoveEmptyEntries);
    var table = new List<Row>(stringRows.Length) ;
    foreach (string stringRow in stringRows)
    {
        table.Add( new Row(stringRow) );
    }
}
Diego658
  • 149
  • 4
  • Thanks, but doesn't work for my need. Try to get the Clipboard data as Text will retrieve me the values with the "mask" included. Example the cell with value "12312451512412", will retrieve me "1.23125E+13". Or "0%" instead of "0.3%". – MiBol May 05 '17 at 13:04
0

in some cases, the Excel data shows a value, but internally have another.

Using the XML method is the reason for that internal structure of multiple datatables. Try this method instead:

private void PasteFromExcel()
{
    DataTable tbl = new DataTable();
    tbl.TableName = "ImportedTable";
    List<string> data = new List<string>(ClipboardData.Split('\n'));
    bool firstRow = true;

    if (data.Count > 0 && string.IsNullOrWhiteSpace(data[data.Count - 1]))
    {
        data.RemoveAt(data.Count - 1);
    }

    foreach (string iterationRow in data)
    {
        string row = iterationRow;
        if (row.EndsWith("\r"))
        {
            row = row.Substring(0, row.Length - "\r".Length);
        }

        string[] rowData = row.Split(new char[] { '\r', '\x09' });
        DataRow newRow = tbl.NewRow();
        if (firstRow)
        {
            int colNumber = 0;
            foreach (string value in rowData)
            {
                if (string.IsNullOrWhiteSpace(value))
                {
                    tbl.Columns.Add(string.Format("[BLANK{0}]", colNumber));
                }
                else if (!tbl.Columns.Contains(value))
                {
                    tbl.Columns.Add(value);
                }
                else
                {
                    tbl.Columns.Add(string.Format("Column {0}", colNumber));
                }
                colNumber++;
            }
            firstRow = false;
        }
        else
        {
            for (int i = 0; i < rowData.Length; i++)
            {
                if (i >= tbl.Columns.Count) break;
                newRow[i] = rowData[i];
            }
            tbl.Rows.Add(newRow);
        }
    }

    DataGridView1.DataSource = tbl;
}

Ref: http://www.seesharpdot.net/?p=221

Edit:

I've done some tests and even using the "XML Spreadsheet" Clipboard format the data can get stored in exponential notation:

enter image description here

You could detect and convert these numbers: Parse a Number from Exponential Notation

Community
  • 1
  • 1
Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321
  • Thanks to sharing but doesn't work. In the example; the clipboard data has a cell value "1.23125E+13", but the real value of the cell is "12312451512412". This is one of the main reasons that I want to read the clipboard as XML SpreadSheet. – MiBol May 05 '17 at 13:01