1

I have 120 different Excel raw data files and from those I need to pull data from around 500 (maybe more) different cells into a tabular format in one overview workbook. What is the most efficient way to do this?

The layout of each workbook is identical, so I have started linking my new overview sheet to the cells in the first data file and was planning on using find and replace to change the path to refer to the other data files. Which is laborious to say the least!

I'm also learning Python so would welcome Python based solutions, or pretty much any solution's really!

Thank you,

Joe

Maverick
  • 789
  • 4
  • 24
  • 45
  • you could do it with SQL and ADO – Nathan_Sav Nov 24 '16 at 15:51
  • @Nathan_Sav thanks for your response! For someone who has zero experience with either, would it be worth learning to do this? Or would it take longer to learn than to just do what I'm currently doing? – Maverick Nov 24 '16 at 16:03
  • It should be relatively easy to create VBA script that open every file in particular directory and copy/paste some particular values from everyone, but it will of course take time to compute so many workbooks one by one. See: http://stackoverflow.com/questions/10380312/loop-through-files-in-a-folder-using-vba – Limak Nov 24 '16 at 16:11
  • @Limak Thanks for that! I did think of using VBA, but since the cells I need to pull data from are all over the place on different worksheets it may take longer than its worth? E.g. I need the data from D27 to M61 and D65 to N74 in one worksheet, B7 to D17 in another etc. Plus there are blanks I need to ignore and it all needs rearranged to provide an overview in columns. What do you think? – Maverick Nov 24 '16 at 16:25
  • VBA is designed for manipulation in Excel files, and there is no other language better suited for this specific types of files. Excel is, of course, not best solution for store data, but if you are for some reason forced to use this program, VBA is your natural tool. – Limak Nov 24 '16 at 16:31
  • @Limak thanks for that, will go down the VBA route then! Yes, I agree. But it is a constant battle to get the powers that be on board with that! – Maverick Nov 24 '16 at 16:43
  • An alternative is [Power Query](https://www.google.co.uk/search?q=power+query&oq=power+query&aqs=chrome..69i57j69i65j0l4.3551j0j7&sourceid=chrome&ie=UTF-8). This is an amazing tool, built by Microsoft. Some versions of Excel include it out of the box. Others require a free download. With PQ you can build one importer and then run it against multiple sources. – David Rushton Nov 24 '16 at 17:09

1 Answers1

1

Python offers many modules. I would recommend the openpyxl module. You could read about it here. If I understood you correctly you want to combine multiple excel spreadsheets. The way I woudl do it is add a row to a new spreadsheet for each row in all the excel spreadsheets. I wrote a simple program to do this:

import openpyxl
import os
from os.path import join
spreadsheet = openpyxl.Workbook()
final_sheet = spreadsheet.get_sheet_by_name('sheet1')
x = 0
for(dir, dirs, files) in os.walk('C:\Users\Cheyn Shmuel\Documents'):
        for file in files:
            filename = join(dir, file)
            try:
                workbook = openpyxl.load_workbook(filename)
            except: 
                continue # in case there are files that aren't excel in that directory
            for s in workbook.get_sheet_names():
                sheet = workbook.get_sheet_by_name(s)
                for row in sheet.rows:
                    for cell in row:
                        try:
                            final_sheet[cell.coordinate[0] + str(int(cell.coordinate[1:]) + x)] = cell.value
                        except:
                            final_sheet[cell.coordinate[:1] + str(int(cell.coordinate[2:]) + x)] = cell.value
                x += sheet.get_highest_row()

spreadsheet.save('your file.xlsx')

This program will go through all the excel files in your directory and put them into a new spreadsheet, and then put the next one after that, and so on.

Cheyn Shmuel
  • 428
  • 8
  • 15