I have read several articles on machine learning.
An example is this https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1 and was wondering if it's possible to modify the code to do the use case below.
Let's say the machine is given an excel file (assuming .xlsx) that has only 1 sheet.
Inside the sheet there will be a table (defined as having a header row and at least 2 data rows and will be agreed as a table by common sense humans who read the file. Not a named table or object as defined by MS Excel)
The following is unknown before hand:
- the position of the table
- the number of rows and columns
- the data type in the table
This table in the excel file is also NOT named or identified beforehand in any way. There are no objects in this excel file. No named tables.
If I were to attempt to use VBA to list objects, I will get this error message
There will be at least 1 other cell outside of this table that will contain some text or value.
How or what algorithm using what software library can I train a machine to identify where does the table start and end?
E.g.
The table first cell (identified as the top left most cell in the header row) is A4 and the table last cell (identified as the bottom right most cell) is G12
The machine is able to read the excel file and say that the table is at A4:G12 even if there are other cells outside the table that has data.
My various google search involving "excel and machine learning" or "identify tables in excel" tends to give me articles on how to feed data to machine learning software using excel files or how to find/name data tables that are named in the Excel. I want to emphasize the table data is NOT named.
Also come across answers talking about extracting such table data from PDFs. But my focus is on Excel.
I have read that machine learning can be used to read images. So I imagine trying to "read" an excel file which is a highly structured file just to find where a table is should be possible.
Here are some sample screenshots
Here are the links to the Excel files
https://www.dropbox.com/s/l3vjjsgunp0zu23/A4toG12.xlsx?dl=0
https://www.dropbox.com/s/nwzw0211ruhwvf0/G7toN19.xlsx?dl=0
Please note that these are dummy files to illustrate my point about identifying the table data.
- The first one should be identified as A4 to G12
- The second one should be identified as G7 to N19
The comment by JanLauGe about ctrl
+ a
is interesting See Using machine learning to parse excel file and extract table data with no named tables involved? If so, how do I get started?
I have googled for the equivalent in VBA but to no avail.
What I have tried
I wrote a script as suggested by Scott Craner to find all the cells with values I get this.
Thanks to his comments, I got it to work.
Public Sub LookForCells()
For Each block In ActiveSheet.Columns("A:Z").SpecialCells(xlCellTypeConstants, 23).Areas
MsgBox block.Count
Next block
End Sub
And you should see the following