0

Apologies if this is a basic question, but let us say I have a tab delimited file named file.txt formatted as follows:

Label-A    [tab]    Value-1

Label-B    [tab]    Value-2

Label-C    [tab]    Value-3

[...]

Label-i    [tab]    Value-n

I want xlrd or openpyxl to add this data to the excel worksheet named Worksheet in the file workbook.xlsx such that the cells contain the following values. I do not want to affect the contents of any other part of workbook.xlsx other than the two columns that are affected

A1=Label-A

B1=Value-1

A2=Label-B

B2=Value-2

[etc.]

EDIT: Solution

import sys
import csv
import openpyxl

tab_file = sys.stdin.readlines()

reader = csv.reader(tab_file, delimiter='\t')
first_row = next(reader)
num_cols = len(first_row)

try:
    workbook = sys.argv[1]
    write_sheet = sys.argv[2]
except Exception:
    raise sys.exit("ERROR")

try:   
    first_col = int(sys.argv[3])
except Exception:
    first_col = 0

tab_reader = csv.reader(tab_file, delimiter='\t')
xls_book = openpyxl.load_workbook(filename=workbook)
sheet_names = xls_book.get_sheet_names()
xls_sheet = xls_book.get_sheet_by_name(write_sheet)
for row_index, row in enumerate(tab_reader):
    number = 0
    col_number = first_col
    while number < num_cols:
        cell_tmp = xls_sheet.cell(row = row_index, column = col_number)
        cell_tmp.value = row[number]
        number += 1
        col_number += 1
xls_book.save(workbook)
Bas Swinckels
  • 18,095
  • 3
  • 45
  • 62
user3543052
  • 195
  • 1
  • 2
  • 8
  • 1
    Is not basic, but you should show some effort on what you tried to do yourself. That way, people looking to answer this question will have something to work with (otherwise you risk getting downvoted too) – Savir Apr 16 '14 at 21:56
  • 2
    Thanks for the advice. I have added in what I have done, but this is my first time using Python (generally work in Bash) so I'm unsure how helpful it will be to readers. – user3543052 Apr 16 '14 at 22:29
  • If you don't set use_iterators=True then you can modify an existing file. – Charlie Clark Apr 17 '14 at 14:07
  • 1
    One (slight)more recommendation for Stack Overflow: You don't need to edit the title of your question to mark it as **solved** (as other forums recommend you to do) The fact that the post has an answer selected as the *Chosen answer* will tell other people that. BYW, kuddos for clearly labeling your EDIT with an EDIT, and not just changing the original question. Well done **:-)** – Savir Apr 17 '14 at 20:18

3 Answers3

1

Since you said you are used to working in Bash, I'm assuming you're using some kind of Unix/Linux, so here's something that will work on Linux.

Before pasting the code, I'd like to point a few things:

Working with Excel in Unix (and Python) is not that straightforward. For instance, you can't open an Excel sheet for reading and writing at the same time (at least, not as far as I know, although I must recognize that I have never worked with the openpyxl module). Python has two well known modules (that I am used to working with :-D ) when it comes to handling Excel sheets: One is for reading Excel sheets (xlrd) and the second one for writing them (xlwt) With those two modules, if you want to modify an existing sheet, as I understand you want to do, you need to read the existing sheet, copying it to a writable sheet and edit that one. Check the question/answers in this other S.O. question that explain it with some more detail.

Reading whatever-separated files is much easier thanks to the csv module (its prepared for comma-separated files, but it can be easily tweaked for other separators). Check it out.

Also, I wasn't very sure from your example if the contents of the tab-separated file indicate somehow the row indexes on the Excel sheet or they're purely positional. When you say that in the tab-separated file you have Value-2, I wasn't sure if that 2 meant the second row on the Excel file or it was just an example of some text. I assumed the latest (which is easier to deal with), so whatever pair Label Value appears on the first row of your tab-separated file will be the first pair on the first row of the Excel file. It this is not the case, leave a comment a we will deal with it ;-)

Ok, so let's assume the following scenario:

You have a tab-separated file like this:

stack37.txt:

Label-A Value-1
Label-B Value-2
Label-C Value-3

The excel file you want to modify is stack37.xls. It only has one sheet (or better said, the sheet you want to modify is the first one in the file) and it initially looks like this (in LibreOffice Calc):

enter image description here

Now, this is the python code (I stored it in a file called stack37.py and it's located in the same directory of the tab-separated file and the excel file):

import csv
import xlwt
import xlrd
from xlutils import copy as xl_copy

with open('stack37.txt') as tab_file:
    tab_reader = csv.reader(tab_file, delimiter='\t')
    xls_readable_book = xlrd.open_workbook('stack37.xls')
    xls_writeable_book = xl_copy.copy(xls_readable_book)
    xls_writeable_sheet = xls_writeable_book.get_sheet(0)
    for row_index, row in enumerate(tab_reader):
        xls_writeable_sheet.write(row_index, 0, row[0])
        xls_writeable_sheet.write(row_index, 1, row[1])
    xls_writeable_book.save('stack37.xls')

After you run this code, the file stack37.xls will look like this:

enter image description here

What I meant about not knowing what you exactly wanted to do with the values in your tab-separated file is that regardless of what you name your items in there, it will modify the first row of the excel sheet, then the second... (even if your first Value is called Value-2, the code above will not put that value on the second row of the Excel sheet, but on the fist row) It just assumes the first line in the tab-separated file corresponds with the values to set on the first row of the Excel sheet.

Let explain with an slightly modified example:

Let's assume your original Excel file looks like the original excel file on my screenshot (the full of | Hello-Ax | Bye-Bx |) but your tab-separated file now looks like this:

stack37.txt:

foo bar
baz baz2

After you run stack37.py, this is how your Excel will look like:

enter image description here

(see? first row of the tab-separated file goes to the first row in the Excel file)

UPDATE 1:

I'm trying the openpyxl module myself... Theoretically (according to the documentation) the following should work (note that I've changed the extensions to Excel 2007/2010 .xlsx):

import csv
import openpyxl

with open('stack37.txt') as tab_file:
    tab_reader = csv.reader(tab_file, delimiter='\t')
    xls_book = openpyxl.load_workbook(filename='stack37.xlsx')
    sheet_names = xls_book.get_sheet_names()
    xls_sheet = xls_book.get_sheet_by_name(sheet_names[0])
    for row_index, row in enumerate(tab_reader):
        cell_tmp1 = xls_sheet.cell(row = row_index, column = 0)
        cell_tmp1.value = row[0]
        cell_tmp2 = xls_sheet.cell(row = row_index, column = 1)
        cell_tmp2.value = row[1]
    xls_book.save('stack37_new.xlsx')

But if I do that, my LibreOffice refuses to open the newly generated file stack37_new.xlsx (maybe is because my LibreOffice is old? I'm in a Ubuntu 12.04, LibreOffice version 3.5.7.2... who knows, maybe is just that)

Community
  • 1
  • 1
Savir
  • 17,568
  • 15
  • 82
  • 136
  • Thanks. This works on my local machine (I did not intend for the string values to correspond to Excel row values), however, since my system administrators have only imported openpyxl and xlrd those are the only ones I can use if I want it to run remotely (as would be my strong preference). I'm going to use a version of this for now, but I'd like to also see if this can be done with openpyxl (this thread has some promise: http://stackoverflow.com/questions/15004838/how-to-write-a-list-to-xlsx-using-openpyxl) – user3543052 Apr 17 '14 at 14:06
  • 1
    openpyxl does support modifying an existing file. – Charlie Clark Apr 17 '14 at 14:06
  • @user2385133: Yeah, I was trying it right now... but I don't know what it did (or **I** did) that when I tried to adapt the code to `openpyxl`, it corrupted the new file (libreoffice won't open it)... Boh? I'll keep trying, though – Savir Apr 17 '14 at 14:13
  • Hmmm..okay. Would it be possible then, based on your understanding, to create a new workbook entirely based on portions of the old workbook? So let's say I have two tabs in an existing workbook, "Keep" and "Change." I want to create a new workbook that has "Keep" exactly as it was in the old workbook, and then "Change" which is completely distinct (and places in the data from the text file). Based on my project needs, a solution like that would also work. – user3543052 Apr 17 '14 at 14:17
  • @user3543052, with `openpyxl` or the "old" `xlrd`/`xlwt`? (I'm hoping for the latest, because I don't seem to be able to produce a working `.xlsx` excel sheet with `openpyxl`... at least, nothing LibreOffice likes **:-D** If it's the oldies (xlwt), sure, you can do that (by *tabs* you mean Sheets, right? Not *columns*?... is not that it matters that much, just checking... xlrd and xlwt are powerful!! **:-)** ) – Savir Apr 17 '14 at 14:22
  • @user3543052, I've updated my answer (at the bottom of it)... Who knows? Maybe you'll have better luck opening the resulting file than I did... – Savir Apr 17 '14 at 14:34
  • For *writing* `.xlsx` files, the best package out there right now is `xlsxwriter`. Like `xlwt`, it's output-only. But lots of folks do use `openpyxl` successfully, and it's the only one which both reads and writes. – John Y Apr 17 '14 at 14:35
  • @JohnY, Good to know! I didn't know about that module (thanks, I'll check it out). But there's an "administrative" issue in this particular case: The OP seems to depend on a system that only has xlrd and openpyxl installed (see first comment in this answer) – Savir Apr 17 '14 at 14:41
  • @BorrajaX Your openpyxl solution worked!!!! Thanks a bunch! Only leftover questions are: 1) Is it possible to change where the data shows up in the worksheet just by changing the numbers? 2) If 1 is true, is it possible to also add other text files to the workbook in other locations on the sheet? I assume these are both possible, but I just want to make sure openpyxl doesn't have any dependencies that would restrict them. – user3543052 Apr 17 '14 at 14:53
  • @user3543052, **Probably** yes (you just have to know what sheets you're opening and keep control of the "coordinates") although as I said before, I'm not so used to `openpyxl` and I can't currently test it (I'm basing this comment on the scarce documentation of `openpyxl`). You'd have to know somehow in your tab-separated file what positions (cells) you want to modify on which sheets, if applicable (for instance, one entry in your tab file could be `0 2 5 foo` (with that, you could easily navigate to sheet 0, row 2, column 5 of the xlsx file and set its value to `foo`... things like that) – Savir Apr 17 '14 at 15:01
  • Alright thanks. Based on what you're telling me it's probably easier to just reconfigure my upstream processes so that all the necessary information is in one text file, but I'll play around with it. Either way thanks for the help. – user3543052 Apr 17 '14 at 15:17
  • @user3543052, Maybe, yeah... But it's all a matter of trying and come with a suitable solution. You can try something and if you have questions, ask again (with some specific test/code that you tried **;-)** Have fun coding! – Savir Apr 17 '14 at 17:34
  • If you are interested I posted what I finally ended up using at the top. Decided to allow the user to input his/her own workbook name, worksheet name, and (optionally) the column to start on. I also made it read stdin instead of a text file. – user3543052 Apr 17 '14 at 20:05
  • @user3543052, Neat! As a bonus... if you want something to parse arguments from command line, take a look to https://docs.python.org/dev/library/argparse.html (allows you to do stuff like `python ./myscript.py --sheet=SheetName --start_column=0` Things like that. Quite fancy! **:-)** – Savir Apr 17 '14 at 20:14
  • @BorrajaX Best to submit a bug report if a file is being generated that you can't read. – Charlie Clark Apr 17 '14 at 22:51
0

That's a job for VBA, but if I had to do it in Python I would do something like this:

import Excel
xl = Excel.ExcelApp(False)
wb = xl.app.Workbooks("MyWorkBook.xlsx")
wb.Sheets("Ass'y").Cells(1, 1).Value2 = "something"
wb.Save()

With an helper Excel.py class like this:

import win32com.client

class ExcelApp(object):
    def __init__(self, createNewInstance, visible = False):
        self._createNewInstance=createNewInstance

        if createNewInstance:
            self.app = win32com.client.Dispatch('Excel.Application')
            if visible:
                self.app.Visible = True
        else:
            self.app = win32com.client.GetActiveObject("Excel.Application")

    def __exit__(self):
        if self.app and self._createNewInstance:
            self.app.Quit()

    def __del__(self):
        if self.app and self._createNewInstance:
            self.app.Quit()

    def quit(self):
        if self.app:
            self.app.Quit()
stenci
  • 8,290
  • 14
  • 64
  • 104
  • This approach requires Excel for Windows. A lot of folks work with the files but not with Excel and not on Windows. – John Y Apr 17 '14 at 14:26
0

You should use the CSV module in the standard library to read the file.

In openpyxl you can have something like this:

from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb[sheetname]
for idx, line in enumerate(csvfile):
    ws.cell(row=idx, column=0) = line[0]
    ws.cell(row=idx, column=1) = line[1]
wb.save("changed.xlsx")
Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • So I like the simplicity of this but I'm running into some trouble. Here is what I am doing: 1) Convert my file to csv and save it as "csvfile.txt" 2) set csvfile='csvfile.txt' 3) Run this code. Am I doing something wrong? – user3543052 Apr 17 '14 at 14:34
  • @user3543052 without seeing your code and a test file I have no idea what is going right or wrong. What are you converting to CSV? – Charlie Clark Apr 17 '14 at 22:54