-1

I have an Excel sheet, where data is entered in this format:

{"id":"03CD2B3C22558D83","imp":6667, floor:7777}

How can I in python read and extract a certain value of an attribute in Excel, and copy this on another column? For example, I want to obtain the value 6667.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
Monica
  • 11
  • 3
  • 1
    Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Mar 24 '22 at 18:56
  • 1
    Read [How to read data form excel sheets in Python](https://www.sitepoint.com/using-python-parse-spreadsheet-data/) and [How to write data to a specific cell in Python](https://stackoverflow.com/questions/49681392/python-pandas-how-to-write-in-a-specific-column-in-an-excel-sheet) – Alias Cartellano Mar 24 '22 at 19:01
  • Is that entire JSON/dict-like value entered in one cell? I'm assuming `floor` is a typo, and it's actually a string `"floor"` like `"id"` and `"imp"` – Gino Mempin Mar 25 '22 at 12:17

2 Answers2

1

This works, simple and no error checking:

enter image description here

MID(A1,FIND("imp",A1,1)+5,4)
Solar Mike
  • 7,156
  • 4
  • 17
  • 32
  • I think the question was asking for a Python solution: "*how i can in python to read and extract certain the value*". It was also tagged with python. – Gino Mempin Mar 24 '22 at 23:13
  • 1
    @GinoMempin if you look it is **also** tagged with excel, so I was being generous. – Solar Mike Mar 25 '22 at 04:41
  • @GinoMempin and given the poor use of English it is difficult to be absolutely sure - as pointed out in another comment. – Solar Mike Mar 25 '22 at 04:44
  • @GinoMempin and some other user may find the excel answer useful in the future - part of the reason people post on here, or don’t you understand that? – Solar Mike Mar 25 '22 at 05:04
  • @GinoMempin So where is your python solution? At least I was positive with my post… – Solar Mike Mar 25 '22 at 05:05
0

You can use openpyxl to work with Excel files.
The openpyxl tutorial should help you get started on how to use it.

I'll assume this kind of value:

{"id":"03CD2B3C22558D83","imp":6667, floor:7777}

is stored on each cell, like this:

screenshot of Excel file with cells A1-A5 with JSON string values

Using openpyxl, you first read the value of each cell as a string.

from openpyxl import load_workbook

# Assume Excel file named "test.xlsx"
# Assume sheet named "Sheet1"
wb = load_workbook(filename="test.xlsx")
ws = wb["Sheet1"]

# Get the value of cell A1 - A5
# In openpyxl, rows/columns start at index 1
for row_idx in range(1, 6):
    cell = ws.cell(row=row_idx, column=1)
    print(type(cell.value), cell.value)
$ python test.py
<class 'str'> {"id":"03CD2B3C22558D83","imp":6667, floor:7777}
<class 'str'> {"id":"1111111111111111","imp":1111, floor:1000}
<class 'str'> {"id":"2222222222222222","imp":2222, floor:2000}
<class 'str'> {"id":"3333333333333333","imp":3333, floor:3000}
<class 'str'> {"id":"4444444444444444","imp":4444, floor:4000

Then, once you can access each cell value, convert it to a regular Python dictionary.

Now, normally you can use json module's .loads to convert. But the problem is that the JSON string is malformed, it's missing double-quotes "" around floor. You can check on https://jsonlint.com/:

JSONLint error

It would have worked nicely if floor was wrapped in double-quotes:

>>> import json
>>> s = '{"id":"03CD2B3C22558D83","imp":6667, "floor":7777}'
>>> d = json.loads(s)
>>> d["imp"]
6667

If you have no control over how the data was written to the Excel file, check the workarounds for parsing a somewhat wrong JSON with Python. A quick workaround I usually use is the dirtyjson library:

>>> import dirtyjson
>>> s = '{"id":"03CD2B3C22558D83","imp":6667, floor:7777}'
>>> d = dirtyjson.loads(s)
>>> d
AttributedDict([('id', '03CD2B3C22558D83'), ('imp', 6667), ('floor', 7777)])
>>> d = dict(d)
>>> d
{'id': '03CD2B3C22558D83', 'imp': 6667, 'floor': 7777}
>>> d["imp"]
6667

So, assuming there are no typos in the example data, let's use dirtyjson to convert to a dict, then access the imp value:

import dirtyjson
from openpyxl import load_workbook

# Assume Excel file named "test.xlsx"
# Assume sheet named "Sheet1"
wb = load_workbook(filename="test.xlsx")
ws = wb["Sheet1"]

# For each value of cell A1 - A5
#   Convert the value to a dictionary
for row_idx in range(1, 6):  # In openpyxl, rows/columns start at index 1
    cell = ws.cell(row=row_idx, column=1)
    data = dict(dirtyjson.loads(cell.value))
    print(data, data["imp"])
$ python test.py
{'id': '03CD2B3C22558D83', 'imp': 6667, 'floor': 7777} 6667
{'id': '1111111111111111', 'imp': 1111, 'floor': 1000} 1111
{'id': '2222222222222222', 'imp': 2222, 'floor': 2000} 2222
{'id': '3333333333333333', 'imp': 3333, 'floor': 3000} 3333
{'id': '4444444444444444', 'imp': 4444, 'floor': 4000} 4444

Finally, simply write the value to another column. Let's assume, you want to copy it over to the B column on the same row. You can pick any other column or cell, just remember that openpyxl processes cell rows and columns with 1-index notation, so for the B column, that would be column=2.

import dirtyjson
from openpyxl import load_workbook

# Assume Excel file named "test.xlsx"
# Assume sheet named "Sheet1"
wb = load_workbook(filename="test.xlsx")
ws = wb["Sheet1"]

# For each value of cell A1 - A5
#   Convert the value to a dictionary
#   Write the "imp" value to the B column
for row_idx in range(1, 6):  # In openpyxl, rows/columns start at index 1
    cell = ws.cell(row=row_idx, column=1)
    data = dict(dirtyjson.loads(cell.value))
    ws.cell(
        row=row_idx,
        column=2,  # B
        value=data["imp"],
    )

# DO NOT FORGET TO SAVE AND CLOSE WHEN DONE!
# Also, keep the file closed when writing
wb.save(filename="test_output.xlsx")
wb.close()

sample output Excel file with values written to B column


Tested with

  • Python 3.9.10
  • openpyxl 3.0.9
  • dirtyjson 1.0.7
  • macOS 10.15.7 with Microsoft Excel for Mac 16.6.6
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135