0

I have approximately 800+ word documents from which I am attempting to generate a master Table containing all the fields (not an actual word field) in the document.

An example of such documents are shown below:

enter image description here enter image description here

From this the desired table would be something like this:

enter image description here

Once I have this I can tidy it up in Power query and pivot.

Most importantly for my use case here, I need to know all the variations of Header Names which can occur. That being said, Knowing how to extract the complete table from word docs in batch to excel would be helpful in the future.

SOLUTION --> Update 3 (See deleted for previous updates): This very messy code starts to iterate over a selected folder location and generates the data but looses some of the formattings of the previous update(Minor issue) This is okay as It solves the major issue.

Sub CopyTables()
    
    Dim oWord As Word.Application
    Dim WordNotOpen As Boolean
    Dim oDoc As Word.Document
    Dim oTbl As Word.Table
    Dim fd As Office.FileDialog
    Dim FilePath As String
    Dim wbk As Workbook
    Dim wsh As Worksheet
    
    Dim diaFolder As FileDialog
    Dim selected As Boolean
    
    Dim strFile As String
    Dim pdfPath As String
    
    Dim oFSO As Object
    Dim oFolder As Object
    Dim oFile As Object
    Dim i As Integer
    Dim n As Long


    'Create New Workbook
    Set wbk = Workbooks.Add(Template:=xlWBATWorksheet)
    

    ' Get Folder location from User
    Set diaFolder = Application.FileDialog(msoFileDialogFolderPicker)
    
    With diaFolder
    diaFolder.AllowMultiSelect = False
    selected = diaFolder.Show
    If selected Then
        FilePath = diaFolder.SelectedItems(1)
        Debug.Print FilePath
        Set diaFolder = Nothing
        Else
        Beep
        
    End If
    End With
    
    On Error Resume Next
    
    Application.ScreenUpdating = False
    
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFolder = oFSO.GetFolder(FilePath)
    
    'Get the file Names
    
    For Each oFile In oFolder.Files
   
    Debug.Print oFolder.Path
    Debug.Print oFile.Name
    Debug.Print oFolder.Path & "\" & oFile.Name
    
    FilePath = oFolder.Path & "\" & oFile.Name
    
    
    '------------------------------------------------------------------
   
    'Get or start Word
    Set oWord = GetObject(Class:="Word.Application")
    If Err Then
        Set oWord = New Word.Application
        WordNotOpen = True
    End If
   
    'On Error GoTo Err_Handler
   
   '-------------------------------------------------------------------
   
    ' Open document
    Set oDoc = oWord.Documents.Open(FilePath)
    ' Loop through the tables
    
    'Set wsh = wbk.Worksheets.Add(After:=wbk.Worksheets(wbk.Worksheets.Count))
    
    For Each oTbl In oDoc.Tables
        ' Create new sheet
        'Set wsh = wbk.Worksheets.Add(After:=wbk.Worksheets(wbk.Worksheets.Count))

    '-------------------------------------------------------------------
    'NEED ASSISTANCE HERE APPEND TO BELOW CELL WITH BLANK SPACE
    
    ' Copy/paste the table
        oTbl.Range.Copy
        
        
        Sheets("Sheet1").Range("B" & Rows.Count).End(xlUp).Offset(1, 0).PasteSpecial xlPasteValues
        

    Next oTbl
    
    '------------------------------------------------------------------
    
    ' Delete the first sheet
    'Application.DisplayAlerts = False
    'wbk.Worksheets(1).Delete
    'Application.DisplayAlerts = True
    
    '------------------------------------------------------------------
    
   
Exit_Handler:
    On Error Resume Next
    oDoc.Close SaveChanges:=False
    If WordNotOpen Then
        oWord.Quit
    End If
    
    '------------------------------------------------------------------
    
    
    'Release object references
    Set oTbl = Nothing
    Set oDoc = Nothing
    Set oWord = Nothing
    Application.ScreenUpdating = True

    '------------------------------------------------------------------
   
'Err_Handler:
    'MsgBox "Word caused a problem. " & Err.Description, vbCritical, "Error: " & Err.Number
    'Resume Exit_Handler
    

    Next oFile
    

End Sub

In this code, each line of the document is captured in its own cell. This is okay and may actually be useful.

Will update here with any progress.

What im really not sure how is how to ensure it appends to the previous table.

The end product of this after running 800+ documents will be ridiculous but as I say I can clean up in PQ.

Nick
  • 789
  • 5
  • 22
  • 1
    Are all the word files in a single folder? Are there other files in said folder that you wouldn't want to take into the masterfile? If so, are there similarities in the word doc names to work with? Have a look at https://stackoverflow.com/questions/5971292/get-file-path-ends-with-folder together with the following https://stackoverflow.com/questions/68246938/get-file-list-from-folders-and-subfolders-excel-vba – Notus_Panda Dec 22 '22 at 09:58
  • No I will have a dedicated folder with all word files (Doc and Docx) that should all contain tables. However as noted above there can be some variation in the tables. I suppose some error handling for scenarios where I may have missed something wouldn't hurt. – Nick Dec 22 '22 at 10:14
  • Have you looked at the links I gave you about opening the files from folders after asking the user to choose a folder? I do have to mention that I'm not a huge fan of your choice to make 800+ worksheets in your masterfile. Seems a bit excessive and unnecessary, also not how you showed you wanted it in your "desired output". – Notus_Panda Dec 22 '22 at 11:19
  • I am currently looking at them and trying to incorporate parts into the code above. If you have any better suggestions its always appreciated. This is just a way to try and get the dirty data and then as I say I will tidy up using PQ. – Nick Dec 22 '22 at 11:26
  • 1
    You could append it all in the first sheet of the masterfile with something like this: https://stackoverflow.com/a/4465250/19353309 . Then it should be easier to also get your Filepath in front of all the cells, just remember to have a counter going to get your values in the right row in the Masterfile. – Notus_Panda Dec 22 '22 at 12:03
  • See also: https://stackoverflow.com/questions/55483014/extracting-specific-cells-from-a-folder-full-of-identical-word-tables/55504001#55504001 and https://forums.excelguru.ca/threads/help-with-vba-to-extract-data-from-word-to-excel.8900/post-36586 – macropod Dec 22 '22 at 13:55
  • Unfortunate unable to get your script to work, though I can see it works for others. Also I wish for the output of each table of different documents to be pasted below the previous one. – Nick Dec 22 '22 at 14:13
  • The code in "Update 2" although very rough, generates a new sheet for each file. Just need to figure out how to append to the end of the other. – Nick Dec 22 '22 at 14:14

2 Answers2

0

Try the following macro:

Sub GetTableData()
'Note: this code requires a reference to the Word object model.
'See under the VBE's Tools|References.
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, strSrc As String, WkSht As Worksheet
Dim c As Long, r As Long, t As Long, x As Long, y As Long
strFolder = GetFolder: If strFolder = "" Then Exit Sub
Dim wdApp As New Word.Application, wdDoc As Word.Document
Set WkSht = ActiveSheet: x = WkSht.Cells(WkSht.Rows.Count, 1).End(xlUp).Row
strFile = Dir(strFolder & "\*.doc", vbNormal)
With wdApp
  'Hide our Word session
  .Visible = False
  'Disable any auto macros in the documents being processed
  .WordBasic.DisableAutoMacros
  'Disable Word interface
  .DisplayAlerts = wdAlertsNone
  While strFile <> ""
    strSrc = strFolder & "\" & strFile
    Set wdDoc = .Documents.Open(Filename:=strSrc, AddToRecentFiles:=False, Visible:=False)
    strSrc = Split(strSrc, ".doc")(0)
    Application.StatusBar = "Processing " & strSrc
    With wdDoc
      With .Range
        .ListFormat.ConvertNumbersToText
        With .Find
          .ClearFormatting
          .Replacement.ClearFormatting
          .Text = "[^13^l]"
          .Replacement.Text = "¶"
          .Forward = True
          .Wrap = wdFindStop
          .Format = False
          .MatchWildcards = True
          .Execute Replace:=wdReplaceAll
        End With
      End With
      For t = 1 To .Tables.Count
        With .Tables(t)
          For r = 1 To .Rows.Count
            x = x + 1: y = 1: WkSht.Cells(x, y) = strSrc
            For c = 1 To .Columns.Count
              y = c + 1
              WkSht.Cells(x, y).Value = Split(.Cell(r, c).Range.Text, vbCr)(0)
            Next
          Next
        End With
      Next
      .Close SaveChanges:=False
    End With
    strFile = Dir()
  Wend
  .Quit
  Application.StatusBar = False
  WkSht.UsedRange.Replace What:="¶", Replacement:=Chr(10), LookAt:=xlPart, SearchOrder:=xlByRows
  WkSht.UsedRange.Columns.AutoFit
End With
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
Application.ScreenUpdating = True
End Sub
 
Function GetFolder() As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
End Function
macropod
  • 12,757
  • 2
  • 9
  • 21
0
Sub CopyTables()
    
    Dim oWord As Word.Application
    Dim WordNotOpen As Boolean
    Dim oDoc As Word.Document
    Dim oTbl As Word.Table
    Dim fd As Office.FileDialog
    Dim FilePath As String
    Dim wbk As Workbook
    Dim wsh As Worksheet
    
    Dim diaFolder As FileDialog
    Dim selected As Boolean
    
    Dim strFile As String
    Dim pdfPath As String
    
    Dim oFSO As Object
    Dim oFolder As Object
    Dim oFile As Object
    Dim i As Integer


    'Create New Workbook
    Set wbk = Workbooks.Add(Template:=xlWBATWorksheet)



    ' Open the file location
    Set diaFolder = Application.FileDialog(msoFileDialogFolderPicker)
    With diaFolder
    
    diaFolder.AllowMultiSelect = False
    selected = diaFolder.Show

    If selected Then
        
        FilePath = diaFolder.SelectedItems(1)
        Debug.Print FilePath
        
        Set diaFolder = Nothing
        
        Else
        Beep
        
    End If
    End With
    
    On Error Resume Next
    
    

   
    Application.ScreenUpdating = False
    
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFolder = oFSO.GetFolder(FilePath)
    
    'Get the file Names
    
    For Each oFile In oFolder.Files
   
    Debug.Print oFolder.Path
    Debug.Print oFile.Name
    Debug.Print oFolder.Path & "\" & oFile.Name
    
    FilePath = oFolder.Path & "\" & oFile.Name
    
    
    '------------------------------------------------------------------
    
   
    'Get or start Word
    Set oWord = GetObject(Class:="Word.Application")
    If Err Then
        Set oWord = New Word.Application
        WordNotOpen = False
    End If
   
    'On Error GoTo Err_Handler
   
   '---------------------------------------------------------------------
   
    ' Open document
    Set oDoc = oWord.Documents.Open(FilePath)
    ' Loop through the tables
    For Each oTbl In oDoc.Tables
        ' Create new sheet
        Set wsh = wbk.ActiveSheet
        ' Copy/paste the table
        oTbl.Range.Copy
        
        Sheets("Sheet1").Range("B" & Rows.Count).End(xlUp).Offset(2, -1).Select
        ActiveCell.Value = oFile.Name
        
        
        Sheets("Sheet1").Range("B" & Rows.Count).End(xlUp).Offset(2, 0).Select
        
        
        wsh.Paste
        
        
    Next oTbl
    
    '------------------------------------------------------------------
    
   
    ' Delete the first sheet
    'Application.DisplayAlerts = False
    'wbk.Worksheets(1).Delete
    'Application.DisplayAlerts = True
   
Exit_Handler:
    On Error Resume Next
    oDoc.Close SaveChanges:=False
    If WordNotOpen Then
        oWord.Quit
    End If
    'Release object references
    Set oTbl = Nothing
    Set oDoc = Nothing
    Set oWord = Nothing
    Application.ScreenUpdating = True

   
'Err_Handler:
    'MsgBox "Word caused a problem. " & Err.Description, vbCritical, "Error: " & Err.Number
    'Resume Exit_Handler
    

    Next oFile
    

End Sub
Nick
  • 789
  • 5
  • 22