0

I currently have about 100 excel files each with two columns. The first column contains the "headers" and the second column has the values. They each look something like this:

ExcelSpreadsheet

I want to combine these, so that there is one final excel file that contains the data from ALL of these; so that it will look something like this (just with way more columns):

Combined (How it should look at the end, but with all 100 columns)

The other problem also is that not all of the files have those headers in the same order. If you look at that "Combined" picture, for example, you'll see that those two items had common headers. However, in some of the other files, the header orders may be switched. For instance, "GPU Variant" may be before "GPU Name", etc.

So essentially, here's what I need to do. Find a way to combine the second column of all of these spreadsheets, and then find a way to sort them so that they match up the 1st column.

If there's a way to program a macro to do this, can someone guide me as to how to do it? Are there external programs that are already designed to do this? Excel VBA maybe? This is the code that I have right now, but I don't think this addresses it properly:

import xlwt
import xlrd
import os
import csv


current_file = xlwt.Workbook()
write_table = current_file.add_sheet('sheet1', cell_overwrite_ok=True)

key_list = [u'GPU Name:', u'GPU Variant:', u'Architecture:', u'Process Size:', u'Transistors:', u'Die Size:', u'Released:']
for title_index, text in enumerate(key_list):
    write_table.write(0, title_index, text)


file_list = ['2874.csv', '2875.csv']

i = 1
for name in file_list:
    data = xlrd.open_workbook(name)

table = data.sheets()[0]
nrows = table.nrows
for row in range(nrows):
    if row == 0:
        continue
    for index, context in enumerate(table.row_values(row)):
        write_table.write(i, index, context)
    i += 1


current_file.save(os.getcwd() + '/result.csv')
Community
  • 1
  • 1
  • Potentially relevant: https://stackoverflow.com/questions/25400240/using-pandas-combining-merging-2-different-excel-files-sheets – dashiell Jul 26 '17 at 17:27

1 Answers1

0

Comment: I keep on getting "Missing' in the third, fourth, fifth, etc. column

Add the following print(... and Edit your Question to show the Output:

            for values in csv_reader:
                # Init Header Order
                header_keys.append(values['header'])
                ws.append((values['header'], values['data']))
            print('header_keys:{}'.format(header_keys)
        else:

Question: ... combine the second column of all ... sort them so that they match up the 1st column

The following is a csv/openpyxl Solution:
Reading n CSV Files aggregate the Second Column sorted like in the First CSV File.

from openpyxl import Workbook
import csv

wb = Workbook()
ws = wb.worksheets[0]

header_keys = []
for n, fName in enumerate(['2874.csv', '2875.csv']):
    with open(fName) as fh:
        csv_reader = csv.DictReader(fh, fieldnames=['header', 'data'], delimiter='\t')
        if n == 0:
            for values in csv_reader:
                # Init Header Order
                header_keys.append(values['header'])
                ws.append((values['header'], values['data']))
        else:
            # Read all Data to Dict 
            data = {}
            for values in csv_reader:
                data[values['header']] = values['data']

            # Write all Data in header_keys Order
            column = n + 2
            for row, key in enumerate(header_keys, 1):
                try:
                    ws.cell(row=row, column=column).value = data[key]
                except:
                    print('FAIL: Key "{}" not in Dict data'.format(key))
                    ws.cell(row=row, column=column).value = 'MISSING'

wb.save('result.xlsx')

Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2

stovfl
  • 14,998
  • 7
  • 24
  • 51
  • I keep on getting "Missing' in the third, fourth, fifth, etc. columns (depending on how many files I put). It just says "missing" instead of having the values from the tables. – James Flanagin Aug 01 '17 at 18:09