1

I need to read an Excel xlsx file over the network, without using 3rd party tools or having Office installed. I can write in Powershell or C# for this project and am not having luck finding a solution for either language with my requirements. It needs to be native .net.

Does anyone have an example of how to do this?

This solution doesn't work powershell excel access without installing Excel

As I do not have 'Microsoft.Jet.OLEDB.4.0' registered and apparently from searching, you can only use it on 32 bit machines?

Bbb
  • 517
  • 6
  • 27
  • I think you can only run it from 32 bit code, which is not quite the same thing. .net code runs in 32 bit mode by default, but powershell does not. – ADyson Jul 31 '19 at 18:05
  • But what's the issue with using 3rd party tools? Most of the .net ones which can read excel files are free to use. OledB will allow you to make SQL queries against a worksheet, but it's not clear if that's sufficient functionality for you or not – ADyson Jul 31 '19 at 18:07
  • 2
    What's wrong with EPPlus.dll as mentioned in your link. – vrdse Jul 31 '19 at 18:09
  • 3
    xlsx files are essentially just zip archives with xml files containing the data from the various sheets and what not. With a bit of effort you could read in the xml and work with that if you are really desperate to not use any third party tools. – TheMadTechnician Jul 31 '19 at 18:17
  • EPPlus is a very good library that you can use in your project which helps you read excel files without installing any additional third party software. – Gopi Jul 31 '19 at 18:25
  • @Gopi EPPlus **is** 3rd party software but yes I agree it's a good library – ADyson Jul 31 '19 at 18:39
  • To those responding about 3rd party software, I put that I can't use 3rd party for a reason and it wasn't to debate about it. @TheMadTechnician, that's exactly what I've had to do, thank you – Bbb Aug 02 '19 at 16:45

3 Answers3

3

If you decide use C# you could use OpenXML. Just install it using the Nuget package manager. It doesn't need Office installed.

Example:

      using (SpreadsheetDocument spreadsheetDocument =   SpreadsheetDocument.Open("myFile.xlsx", false))
        {
            WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
            WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
            SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
            foreach (Row r in sheetData.Elements<Row>())
            {
                foreach (Cell c in r.Elements<Cell>())
                {
                    Console.Write(c.InnerText + ",");
                }
                Console.WriteLine("\n");
            }
            Console.WriteLine();
            Console.ReadKey();
        }

And add this to the using's

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
Lvcios
  • 49
  • 1
  • 6
2

ImportExcel PowerShell module is a wrapper for EPPLus.dll. It provide simple commmands that you can call to interact with Excel files (xlxs only).

The repository is found here, written by Doug Finke: https://github.com/dfinke/ImportExcel

You can get it from the PowerShell Gallery as well with Install-Module ImportExcel

  • I've used this before and it works well, unfortunately I can't place that on a server – Bbb Aug 02 '19 at 16:43
1

This is not a complete solution for all scenarios as I didn't have time to map out every xml, but this should get anyone started with a similar situation.

Hopefully this will save someone else time in having to develop something similar, the code below converts an xlsx into a zip file, decompresses it and then maps the different xmls to create a virtual representation of the spreadsheet.

It's not the fastest code, but if you were looking for speed you wouldn't use Powershell anyway.

<#

    $columnListOfTuples and $rowListOfTuples have everything mapped

    Each xlsx file is a compressed group of xml files into a zip file that is renamed .xlsx
    There is an xml of shared strings that are then represented by a single numerical value to save space
    There is an xml for each sheet in the spread sheet that has numerical references to the values in the shared strings xml
    To produce a virtual/programmatic representation of the spreadsheet, the code will read the sheet, then read the numerical reference than then cross reference that to the shared strings value
    I have no mapped out all functions and xmls in the compressed file, but this is enough to get you started
    It's only configured for columns A - Z, you'll have to write code for additional columsn
    You need powershell 5.0 or later to have the native unzip feature
    It assumes your first worksheet is named worksheet1
#>

cls;

#Region Read Xlsx file

    #Region Create temporary zip file and location

    $fileDate = [DateTime]::Now.ToShortDateString();
    $fileDate = $fileDate.Replace("/", "-");
    $fileTime = [DateTime]::Now.ToShortTimeString();
    $fileTime = $fileTime.Replace(":", ".");
    $fileTime = $fileTime.Replace(" ", ".");
    $fileDateTime = $fileDate + "_" + $fileTime;

    $networkFile = "http://[xxxxx].xlsx";
    $originalFile = "c:\temp\file.xlsx"; # file must exist and location must have writable rights
    #Invoke-WebRequest -Uri $networkFile -OutFile $originalFile -UseDefaultCredentials; # this is to download a network file locally, may not be needed

    $index = $originalFile.LastIndexOf("\");
    $originalFileDirectory = $originalFile.Remove($index);
    $tempCopiedFileName = $originalFileDirectory + "\tempXlsx-" + $fileDateTime + ".zip";
    $desinationPath = $originalFileDirectory + "\xlsxDecompressed";

    #Endregion /Create temporary zip file and location

    #Region Map all of the zip/xlsx file paths

    $relsXmlFile = $desinationPath + "\_rels\.rels";
    $item1XmlRelsXmlFile = $desinationPath + "\customXml\_rels\item1.xml.rels";
    $item2XmlRelsXmlFile = $desinationPath + "\customXml\_rels\item2.xml.rels";
    $item3XmlRelsXmlFile = $desinationPath + "\customXml\_rels\item3.xml.rels";
    $item1XmlFile = $desinationPath + "\customXml\item1.xml";
    $item2XmlFile = $desinationPath + "\customXml\item2.xml";
    $item3XmlFile = $desinationPath + "\customXml\item3.xml";
    $itemProps1XmlFile = $desinationPath + "\customXml\itemProps1.xml";
    $itemProps2XmlFile = $desinationPath + "\customXml\itemProps2.xml";
    $itemProps3XmlFile = $desinationPath + "\customXml\itemProps3.xml";
    $appXmlFile = $desinationPath + "\[filename]\docProps\app.xml";
    $coreXmlFile = $desinationPath + "\[filename]\docProps\core.xml";
    $customXmlFile = $desinationPath + "\[filename]\docProps\custom.xml";
    $workbookXmlRelsXmlFile = $desinationPath + "\xl\_rels\workbook.xml.rels";
    $theme1XmlFile = $desinationPath + "\xl\theme\theme1.xml";
    $sheet1XmlRelsXmlFile = $desinationPath + "\xl\worksheets|_rels|sheet1.xml.rels";
    $workSheet1XmlFile = $desinationPath + "\xl\worksheets\sheet1.xml";
    $sharedStringsXmlFile = $desinationPath + "\xl\sharedStrings.xml";
    $stylesXmlFile = $desinationPath + "\xl\styles.xml";
    $workbookXmlFile = $desinationPath + "\xl\workbook.xml";
    $contentTypesXmlFile = $desinationPath + "\[Content_Types].xml";

    #Endregion /Map all of the zip/xlsx file paths

    Copy-Item $originalFile -Destination $tempCopiedFileName -Recurse; # copy the xlsx file as a temporary zip file
    Expand-Archive -LiteralPath $tempCopiedFileName -DestinationPath $desinationPath -Force; # unzip the file
    [xml]$fileLines = Get-Content($sharedStringsXmlFile); # read the contents of the shared strings file

    #Region Map shared strings

    $arr_SharedStrings = New-Object System.Collections.Generic.List[String];
    for ($i = 0; $i -lt $fileLines.sst.si.Count; $i++ )
    {
        $line = $fileLines.sst.si[$i];
        $tuple = [tuple]::create([int]$i, [string]("`"" + $line.InnerText + "`"") );
        $arr_SharedStrings.Add($tuple); # map the shared strings to the representational number inside of the xml files
    }

    #Endregion /Map shared strings

    [xml]$fileLines = Get-Content($workSheet1XmlFile);
    $sheetData = $fileLines.worksheet.sheetData;

    #Region Map rows
    $arr_sheet1_rows = New-Object System.Collections.ArrayList; # this will have a list/tuple of each cell that makes up a row
    for ($i = 0; $i -lt $sheetData.row.Count; $i++ )
    {
        $arr_sheet1_individualRow = New-Object System.Collections.Generic.List[String];
        $rowElements = $sheetData.row[$i].c; #grab data of all cells in that row
        foreach ($element in $rowElements)
        {
            $elementsTuple = [tuple]::create([string]$element.r, [string]$element.s, [string]$element.v)
            $arr_sheet1_individualRow.Add($elementsTuple);      
        }

        $rowTuple = [Tuple]::create([int] $i+1, $arr_sheet1_individualRow);
        $arr_sheet1_rows.Add($rowTuple) | Out-Null;
    }

    $rowListOfTuples = New-Object System.Collections.ArrayList; # has row number, then column letter, then cell value
    foreach ($rowMapped in $arr_sheet1_rows)
    {
        $cellValues = $rowMapped.Item2;
        $rowNumber = $rowMapped.Item1;
        foreach ($cellValue in $cellValues)
        {
            $cellValue = $cellValue.Remove(0, 1);
            $cellValue = $cellValue.Remove($cellValue.Length - 1, 1);
            $cellValue = $cellValue.Replace(" ", "");
            $cellValuesSplit = $cellValue.Split(",");
            $columnLetter = $cellValuesSplit[0];
            $columnLetter = $columnLetter -replace '[0-9]',''
            $cellValueOnly = $cellValuesSplit[2];

            $cellTuple = [tuple]::create([int]$rowNumber, [string]$columnLetter, [string]$cellValueOnly);
            $rowListOfTuples.Add($cellTuple) | Out-Null;
        }
    }

    #Endregion /Map shared rows

    #Region Map shared columns

    $rowCount = $arr_sheet1_rows.Count;
    $sheetDataCols = $fileLines.worksheet.cols;
    $arr_sheet1_columns = New-Object System.Collections.ArrayList; # this will have a list/tuple of each cell that makes up a column
    #$arr_sheet1_columns.Add("Place holder") | Out-Null;
    #for ($i = 0; $i -lt $sheetDataCols.col.Count; $i++ )
    for ($i = 0; $i -lt $rowCount; $i++ )
    {
        if ($arr_sheet1_columns.Count -lt $sheetDataCols.col.Count)
        {
            for ($j = 0; $j -lt $sheetDataCols.col.Count; $j++ )
            {
                switch  ($j)
                {
                    0 { $columnLetterTranslated = "A"; break; }
                    1 { $columnLetterTranslated = "B"; break; }
                    2 { $columnLetterTranslated = "C"; break; }
                    3 { $columnLetterTranslated = "D"; break; }
                    4 { $columnLetterTranslated = "E"; break; }
                    5 { $columnLetterTranslated = "F"; break; }
                    6 { $columnLetterTranslated = "G"; break; }
                    7 { $columnLetterTranslated = "H"; break; }
                    8 { $columnLetterTranslated = "I"; break; }
                    9 { $columnLetterTranslated = "J"; break; }
                    10 { $columnLetterTranslated = "K"; break; }
                    11 { $columnLetterTranslated = "L"; break; }
                    12 { $columnLetterTranslated = "M"; break; }
                    13 { $columnLetterTranslated = "N"; break; }
                    14 { $columnLetterTranslated = "O"; break; }
                    15 { $columnLetterTranslated = "P"; break; }
                    16 { $columnLetterTranslated = "Q"; break; }
                    17 { $columnLetterTranslated = "R"; break; }
                    18 { $columnLetterTranslated = "S"; break; }
                    19 { $columnLetterTranslated = "T"; break; }
                    20 { $columnLetterTranslated = "U"; break; }
                    21 { $columnLetterTranslated = "V"; break; }
                    22 { $columnLetterTranslated = "W"; break; }
                    23 { $columnLetterTranslated = "X"; break; }
                    24 { $columnLetterTranslated = "Y"; break; }
                    25 { $columnLetterTranslated = "Z"; break; }
                }
                $arr_sheet1_columns.Add($columnLetterTranslated) | Out-Null;
            }       
        }

        $rowElements = $sheetData.row[$i].c; #grab data of all cells in that row
        foreach ($element in $rowElements)
        {
            $columnLetter = $element.r -replace '[^a-zA-Z-]',''
            for ($k = 0; $k -lt $arr_sheet1_columns.Count; $k++)
            {
                $column = $arr_sheet1_columns[$k];
                if ($column.StartsWith($columnLetter))
                {
                    $columnTemp = $column.ToString() + "|" + $element.v;
                    $arr_sheet1_columns.Remove($column);
                    $arr_sheet1_columns.Insert($k, $columnTemp);
                    $stop = "here";
                }
            }
        }
    }

    $columnListOfTuples = New-Object System.Collections.ArrayList; # has column letter, than row number, then cell value
    foreach ($columnMapped in $arr_sheet1_columns)
    {
        $index = $columnMapped.IndexOf("|");
        $columnMappedLetter = $columnMapped.Remove($index);

        $columnIndividualValues = $columnMapped.Remove(0, $index);
        $columnIndividualValuesSplit = $columnIndividualValues.Split("|");

        $rowNumber = 0;
        foreach($columnIndividualValueSplit in $columnIndividualValuesSplit)
        {   
            $cellTuple = [tuple]::create([string]$columnMappedLetter, [int]$rowNumber, [string]$columnIndividualValueSplit);
            $columnListOfTuples.Add($cellTuple) | Out-Null;
            $rowNumber ++;
        }
    }

    #Endregion /Map shared columns



    # Here you have all of the data and you can parse it however you want, below is an example of that

    #Region Parse the extracted data

    $rowsForParsing = New-Object System.Collections.ArrayList;
    foreach ($arr_sheet1_row in $arr_sheet1_rows)
    {
        $rowNumber = $arr_sheet1_row.Item1;
        $cellValues = New-Object System.Collections.ArrayList; 
        $rowTemp = "";

        $indexer = 0;
        foreach ($rowValue in $arr_sheet1_row.Item2)
        {
            $valueSplit = $rowValue.Replace(" ", "");
            $valueSplit = $valueSplit.Replace("(", "");
            $valueSplit = $valueSplit.Replace(")", "");
            $valueSplit = $valueSplit.Split(",");

            $cellLocation = "";
            $cellLocation = $valueSplit[0];
            $cellSType = "";
            $cellSType = $valueSplit[1];
            $cellStringRepresentationValue = "";
            $cellStringRepresentationValue = $valueSplit[2];
            $translatedValue = "";

            if ($cellStringRepresentationValue -ne "")
            {
                switch ($cellSType)
                {
                    "1" { $translatedValue = getStringMappedValue($cellStringRepresentationValue); break;}  # represents column headers
                    "2" { $translatedValue = [datetime]::FromOADate($cellStringRepresentationValue); break;}  # number of days from January 1, 1900
                    "3" { $translatedValue = getStringMappedValue($cellStringRepresentationValue); break;}  # mapped value to strings table
                    "4" {$translatedValue = getStringMappedValue($cellStringRepresentationValue); break;}  # mapped value to strings table
                    "5" { $translatedValue = getStringMappedValue($cellStringRepresentationValue); break;}  # mapped value to strings table
                    "6" { Write-Host "#6: " $rowValue; break;}  # not sure what 6 represents yet
                    "7" { Write-Host "#7: " $rowValue; break;}  # not sure what 7 represents yet
                    "8" { Write-Host "#8: " $rowValue; break;}  # not sure what 8 represents yet
                    "9" { Write-Host "#9: " $rowValue; break;}  # not sure what 9 represents yet
                    "10" { Write-Host "#10: " $rowValue; break;}  # not sure what 10 represents yet
                    "11" { Write-Host "#11: " $rowValue; break;}  # not sure what 11 represents yet     
                }
            }

            if (($rowTemp -eq "") -and ($indexer -eq 0))
            {
                $rowTemp = "$translatedValue";
            }
            else
            {       
                $rowTemp = "$rowTemp" + "|" + "$translatedValue";
            }   

            $indexer++;
        }

        if ($rowNumber -eq 22)
        {
            $stop = "here";
        }

        $rowsForParsing.Add("$rowNumber|$rowTemp") | Out-Null;
    }

    $fileInformation = New-Object System.Collections.ArrayList;
    $topRow = $rowsForParsing[1];
    foreach ($rowParsed in $rowsForParsing)
    {
        $rowParsedSplit = $rowParsed.Split("|");    
        $date = $rowParsedSplit[1];

        $pass = $true;
        try
        {
            $dtVal = get-date $date;
        }
        catch
        {
            $pass = $false;
        }

        if ($pass -eq $true)
        {
            if ((get-date) -gt (get-date $date))
            {
                $dateApplicableRow = $rowParsed;
            }
            else
            {
                Write-Host "Top row is $topRow";
                Write-Host "This weeks row is $dateApplicableRow";
                $fileInformation.Add($topRow);
                $fileInformation.Add($dateApplicableRow);
                break;
            }
        }
    }

    #Endregion /Parse the extracted data

#EndRegion /Read Xlsx file
Bbb
  • 517
  • 6
  • 27