0

I am new to python. This is some basic code I am trying to call

X, Y = load_data('./examples/data/scene')

Here is the function code

import numpy as np
import gzip
import pickle
import itertools as it
import os
import arff    # liac-arff
import xml.etree.ElementTree as ET
import pandas as pd

def load_data(dataset_path: str):
"""Dataset loading function for dataset downloaded from mulan.
"""
arff_filename = dataset_path + ".arff"
xml_filename = dataset_path + ".xml"
X, Y = load_arff(arff_filename, xml_filename)
return X, Y

def load_arff(arff_filename: str, xml_filename: str):
# read arff file
with open(arff_filename, "r") as fp:
    data = arff.load(fp)

# read xml file
tree = ET.parse(xml_filename)
root = tree.getroot()
label_list = []
for child in root:
    label_list.append(child.attrib["name"])
#for attr in range(len(data["attributes"])):
#   column_list = attr[0]
column_list = [attr[0] for attr in data["attributes"]]
feature_list = list(set(column_list) - set(label_list))

# build converters to convert nominal data to numerical data
converters = {}
for attr in data["attributes"]:
    if attr[1] == 'NUMERIC':
        pass
    elif isinstance(attr[1], list):
        converter = {}
        for e, cls in enumerate(attr[1]):
            converter[cls] = e
        converters[attr[0]] = converter
    else:
        raise NotImplementedError("attribute {} is not supported.".format(att[1]))
#print(converters, column_list, feature_list)

# ipdb.set_trace()
df = pd.DataFrame(data['data'], columns=column_list)
df.replace(converters, inplace=True)
# print "Read as sparse format"
# n_instance = len(data["data"])
# dense_data = np.zeros( (n_instance, len(feature)+len(label)), dtype=float)
# for i,instance in enumerate(data["data"]):
#     for sf in instance:
#         idx, val = sf.split(' ')
#         dense_data[i][int(idx)] = val
# data = dense_data

X = df[feature_list].values
Y = df[label_list].values
if Y.dtype != np.int:
    raise ValueError("Y is not int.")

return X, Y

def pairwise_hamming(Z, Y):
"""
Z and Y should be the same size 2-d matrix
"""
return -np.abs(Z - Y).mean(axis=1)


def pairwise_f1(Z, Y):
"""
Z and Y should be the same size 2-d matrix
"""
# calculate F1 by sum(2*y_i*h_i) / (sum(y_i) + sum(h_i))
Z = Z.astype(int)
Y = Y.astype(int)
up = 2*np.sum(Z & Y, axis=1).astype(float)
down1 = np.sum(Z, axis=1)
down2 = np.sum(Y, axis=1)

down = (down1 + down2)
down[down==0] = 1.
up[down==0] = 1.

#return up / (down1 + down2)
#assert np.all(up / (down1 + down2) == up/down) == True
return up / down

This is the error I get when I try to run the code

Traceback (most recent call last):
File "C:\Users\sambhav\Desktop\RethinkNet\examples\classification.py", line 63, in 
<module>
main()
File "C:\Users\sambhav\Desktop\RethinkNet\examples\classification.py", line 57, in main
CSRPE_example()
File "C:\Users\sambhav\Desktop\RethinkNet\examples\classification.py", line 25, in 
CSRPE_example
X, Y = load_data('./examples/data/scene')
File "C:\Users\sambhav\Desktop\RethinkNet\mlearn\utils\__init__.py", line 18, in 
load_data
X, Y = load_arff(arff_filename, xml_filename)
File "C:\Users\sambhav\Desktop\RethinkNet\mlearn\utils\__init__.py", line 34, in 
load_arff
column_list = [attr[0] for attr in data['attributes']]
TypeError: 'generator' object is not subscriptable

I am unable to figure this out, any help in this regard?
link to this file: https://drive.google.com/file/d/128tOss08QpU0txq49fbt2dADrX4Yacl8/view?usp=sharing

  • Your code is not properly indented, but what I can tell from what you shared, `arff.load(fp)` returns a generator of rows, but you're accessing it as `data['attributes']`, which doesn't work as a generator isn't subscriptable. – Grismar Nov 24 '21 at 04:43
  • So how can i change this? Should I use a different function other than arff.load or? – Sambhav Jain Nov 24 '21 at 04:49
  • Here is a similar thing that i found, but I am unable to modify this function in this regard. https://stackoverflow.com/a/6288032 – Sambhav Jain Nov 24 '21 at 04:57

1 Answers1

0

It seems like you need to convert your generator object into a dictionary/list.

Instead of column_list = [attr[0] for attr in data["attributes"]] you could you do something like this:

data_list=[]
for i in data:
    data_list.append(i)

Then use print(data_list) to see what type of data you get

Regretful
  • 336
  • 2
  • 10
  • I tried this, data = arff.load(open(arff_filename)) . Unfortunately getting the same error :- File "C:\Users\sambhav\Desktop\RethinkNet\mlearn\utils\__init__.py", line 35, in load_arff column_list = [attr[0] for attr in data["attributes"]] TypeError: 'generator' object is not subscriptable – Sambhav Jain Nov 24 '21 at 07:47
  • This still uses data as a generator – Sambhav Jain Nov 24 '21 at 07:48
  • Did you fix your indentation? – Regretful Nov 24 '21 at 09:55
  • I think the comments are trying to get you to figure this out for yourself, but the point is that you have to convert your generator named `data` into a `dict` so you can subscript it, which has something to do with how `arff.load` works. I think you're missing an important step somewhere. – Andrew Jaffe Nov 24 '21 at 10:26
  • @Regretful yes the indentation is proper in code. i posted link to py script – Sambhav Jain Nov 24 '21 at 10:34
  • i shall try to convert data to dict but I guess the solution has to do something with itertools library but i am unable to figure it out – Sambhav Jain Nov 24 '21 at 10:35
  • Let me know if you would rather I delete the answer because of the mess :) – Regretful Nov 24 '21 at 11:13
  • nvm @Regretful here is the output:- X, Y = load_arff(arff_filename, xml_filename) File "C:\Users\sambhav\Desktop\RethinkNet\mlearn\utils\__init__.py", line 34, in load_arff for i in data: File "C:\Users\sambhav\anaconda3\lib\site-packages\arff\__init__.py", line 239, in load with open(fname, 'r') as fhand: TypeError: expected str, bytes or os.PathLike object, not TextIOWrapper – Sambhav Jain Nov 24 '21 at 11:57
  • I believe it should be something like this @Regretful https://stackoverflow.com/a/6288032 – Sambhav Jain Nov 24 '21 at 11:59
  • Now it looks like you are not able to open the file? The fname variable does not seem to be accepted by open(). – Regretful Nov 25 '21 at 06:50
  • Looks like i may need to write a new code . This one seems to have a lot of errors – Sambhav Jain Nov 25 '21 at 13:04