0

I am printing the column name, data type and the max length of the columns from an Excel file.

Below is the code and the output, need to export the output to the json file.

Excel file (assignment.xlsx):

<html>
<table >
  <thead>
    <tr>
      <th>Type</th>
      <th>Amount received</th>
      <th>Currency</th>
      <th>Flag</th>
      <th>Date</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Educationsl</td>
      <td>1422.00</td>
      <td>USD</td>
      <td>2018-11-30</td>
    </tr>
  </tbody>
</table>
</html>

Output

Column Name : Type
Data type : String
Size  : 11

Column Name : Amountreceived
Data type : Float
Size  : 6

Column Name : Currency
Data type : String
Size  : 3

Column Name : Flag
Data type : String
Size  : 1

Column Name : Date
Data type : Float
Date format  : ddmmyyyy

Code

import string
import re
import datetime
import xlrd
loc = r"C:\Users\Documents\PythonStuff\assignment.xlsx"
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)

for i in range(sheet.ncols):
    print ("Column Name : " + re.sub('[^A-Za-z0-9]+', '', sheet.cell_value(0, i).replace(' ','_')))
##Data Type
    if str(sheet.cell_type(1,i)) == '1':
        print  ("Data type : String ")
    elif isinstance(sheet.cell_value(1,i), float) == True:
        print  ("Data type : Float ")
    elif str(sheet.cell_type(1,i)) =='3':
        print  ("Data type : Date ")
## Date Format
    if str(sheet.cell_type(1,i)) =='3':
        print ("Date format  : " + "ddmmyyyy" +"\n ")
    else:
        print ("Size  : " + str(len (str(sheet.cell_value(1,i))))+"\n ")

Desired output

{"Excel": [{"Type": "Educational", "Amount received": "1422.00", "Currency": "USD", "Flag": "N", "Date": "2018-11-30"} ]}
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
YogeshD
  • 1
  • 1
  • Have you tried using the [`json`](https://docs.python.org/3/library/json.html) module? – mkrieger1 Feb 07 '19 at 09:32
  • @mkrieger1: {"Excel": [{"Type": "Educational", "Amount received": "1422.00", "Currency": "USD", "Flag": "N", "Date": "2018-11-30"} ]} – YogeshD Feb 07 '19 at 09:39
  • @user5173426 : Tried json module, but not sure how to integrate it with the existing code – YogeshD Feb 07 '19 at 09:39

1 Answers1

0

I assume that you want to use the values of the 1st row as the dict keys and every row is an item in a the list. Add this to you code:

keys = [i.value for i in sheet.row(0)]
sheet_dict = {'Excel': list()}

for row in range(1, sheet.nrows):
    row_dict = {}
    for col in range(sheet.ncols):
        row_dict[keys[col]] = sheet.cell_value(rowx=row, colx=col)
    sheet_dict['Excel'].append(row_dict)

print(sheet_dict)

Or if you want a pretty output:

import json
print(json.dumps(sheet_dict, sort_keys=True, indent=4))
Frans
  • 799
  • 6
  • 7
  • Hello Frans, thanks for the answer provided. But the above code will print the data from the excel sheet. Instead I need to print the structure of the table (excel file) i.e its column name the data type of the value that column holds and the maximum length. – YogeshD Feb 12 '19 at 06:18