3

I have an Excel file (data.xlsx) with the following pattern and I need to read a value from a textbox inside that Excel file.

I am currently using pandas library and I tried to obtain the value, but unfortunately could not find any API for that objective.

Does anyone know how could this be done?

More info:

My question is a duplicate of this sister question from Java.

Edit:

I have provided a solution for anyone who wonders how to search for shapes (and possibly all other items) in an excel file, manually (that is, without external modules from pip). It's actually pretty simple. See my comment.

Community
  • 1
  • 1
yamb
  • 101
  • 1
  • 6
  • Put that code that you DID try and perhaps we can assist in fixing it. – Mark Schultheiss Apr 30 '17 at 13:15
  • I tried Googling around but nothing came up regarding Excel textboxes, so there was no code that I actually tried. I'm asking here to see if it's even possible in Python. Thanks! – yamb Apr 30 '17 at 13:17
  • Questions on StackOverflow should always come with code. Just asking whether something is possible or not is not in the spirit of the platform. – Charlie Clark May 01 '17 at 11:19
  • 2
    I disagree. [Example with 2000 upvotes](http://stackoverflow.com/questions/2432764/change-the-uri-url-for-a-remote-git-repository?rq=1), [example with 1000 upvotes](http://stackoverflow.com/questions/151005/create-excel-xls-and-xlsx-file-from-c-sharp?rq=1). I think you're just being pricks. But it doesn't matter, I solved this on my own. I even shared the solution for everyone to benefit. I think that's a bit more in the spirit of the platform. – yamb May 01 '17 at 14:21

4 Answers4

6

Thanks for all the kind help, but I worked this out on my own.

I got this to work using zipfile module. Apparently, Excel is actually a suite that works on compressed XML files (changing the *.xlsx to *.zip reveals the contents of the file) when saving and reading from *.xlsx, so I could easily search the text needed with the comfort of XML.

Here's the module I made. By calling Sheet('path/to/sheet.xlsx').shapes.text, you can now easily find the text inside textboxes:

import zipfile as z


class Sheet(str):
    @property
    def shapes(this):
        s = z.ZipFile(this)
        p='xl/drawings/drawing1.xml'  # shapes path, *.xlsx default
        p='drs/shapexml.xml'  # shapes path, *.xls default
        return XML(s.read(p))


class XML(object):
    def __init__(self, value):
        self.value = str(value)

    def __repr__(self):
        return repr(self.value)

    def __getitem__(self, i):
        return self.value[i]

    def tag_content(self, tag):
        return [XML(i) for i in self.value.split(tag)[1::2]]

    @property
    def text(self):
        t = self.tag_content('xdr:txBody')  # list of XML codes, each containing a seperate textboxes, messy (with extra xml that is)
        l = [i.tag_content('a:p>') for i in t]  # split into sublists by line breaks (inside the textbox), messy
        w = [[[h[1:-2] for h in i.tag_content('a:t')] if i else ['\n'] for i in j] for j in l]  # clean into sublists by cell-by-cell basis (and mind empty lines)
        l = [[''.join(i) for i in j] for j in w]  #  join lines overlapping multiple cells into one sublist
        return ['\n'.join(j) for j in l]  #  join sublists of lines into strings seperated by newline char

So now the pattern provided in my question will be output as ['comments extra'], while a pattern such as:

  1. This is

    Text in a textbox on

    a sheet

  2. And this is another text box somewhere else

    Regardless of the overlapped cells

Will be output as ['This is\nText in a textbox on\na sheet','And this is another text box somewhere else\nRegardless of the overlapped cells'].

You're welcome.

yamb
  • 101
  • 1
  • 6
  • 1
    While this will work for you in this particular instance, it is not a general solution. And rather than rolling your own XML parser it would be better to use one from the standard library. – Charlie Clark May 02 '17 at 08:46
  • 1
    Still happy if any general solution comes up! – yamb May 03 '17 at 08:10
  • 1
    i'm so glad to found your answer.... `changing the *.xlsx to *.zip reveals the contents of the file` thanks for this sentence – and-bri Jun 09 '18 at 15:10
1

You can use Dispatch:

from win32com.client import Dispatch

xl = Dispatch('Excel.Application')
wb = xl.Workbooks.Open(Filename = 'your file name/path')
ws = wb.Worksheets(sheet_index) 

Where sheet_index is whatever number corresponds to the sheet of interest in the workbook. ws.Shapes will have all of the shape objects on the sheet. You can access a shape (textbox) with an integer index, Shapes(index), and then check the name of the object with its name property Shapes(index).Name.

ws.Shapes(index).Name    

Once you figure out which shape you want, then you can view its text like this:

ws.Shapes(index).Characters().Text

Note that you have to call () the Characters method. To assign text, simply assign it. Or you could replace parts of it as well (like dates) with the standard replace method.

ws.Shapes(index).Characters().Text = 'Beluga Whales'

ws.Shapes(index).TextFrame.Characters().Text = ws.Shapes(index).TextFrame.Characters().Text.replace('original text', 'new text')
  • This works for me until the `ws.Shapes(1).Characters().Text` line. That returns `AttributeError: .Characters` The rest of it, `ws.Shapes(1)`, is a `>` – Adam Dec 09 '22 at 06:36
0

This is currently not possible with openpyxl (version 2.4)

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • Thanks, I have shared a solution without openpyxl or any other external pip modules. – yamb May 01 '17 at 14:23
-1

I tried this to get value from text box.

    xls = ExcelFile(request.FILES['yourFileName'])
    df = xls.parse(xls.sheet_names[0])
    for i in df.values:
        print(i[0])               #here you get the value from text box

thanks

Rahul Gupta
  • 504
  • 4
  • 17