0

I am learning Machine Learning and trying to write a code from myself using the Iris Dataset.

I open the dataset with pandas and then I am trying to pass a dictionary in my dataset to convert the last column from Strings into Int but when try this:

dataset.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class']

class_mapping = {'Iris-setosa': 1, 'Iris-versicolor': 2, 'Iris-virginica': 3}
for classe in dataset :
    classe['class'] = classe['class'].map(class_mapping)

PyCharm returns me this: TypeError: string indices must be integers

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
Rezende
  • 23
  • 6
  • 1
    You could just do `dataset['class'] = dataset['class'].map(class_mapping)` – cs95 Jun 15 '18 at 03:13
  • Now, instead of print 'Iris-setosa' it prints 'NaN'. I don't understand why but a least it works. – Rezende Jun 15 '18 at 12:34
  • Out of curiosity, why are you doing this? Are you trying to create dummy variables ? – Dillon Jun 16 '18 at 00:31
  • @Dillon no. I just finished a Machine Learning course and I have a project which I need to training my code. It has 15 different output values (strings)/Labeled data. And I am managing to find the best option and instead of coding direct to my project I am implementing everything in this Iris Dataset which was one of modules of my course and which I am used to work or at least I know the result. – Rezende Jun 16 '18 at 13:17

2 Answers2

0

Finally I managed to solve this problem. Instead of using for loop, I used this:

    dataset ['class'] = dataset ['class']. map (class_mapping)

I didn't need a for loop because .map iterates for me.

Rezende
  • 23
  • 6
0

I come across some code related to the usage of ".map" as following:

  def get_one_shot_iterator(self):
    """Gets an iterator that iterates across the dataset once.

    Returns:
      An iterator of type tf.data.Iterator.
    """

    files = self._get_all_files()

    dataset = (
        tf.data.TFRecordDataset(files, num_parallel_reads=self.num_readers)
        .map(self._parse_function, num_parallel_calls=self.num_readers)
        .map(self._preprocess_image, num_parallel_calls=self.num_readers))

seems that the map function is used twice here, hope this helps.

zheyuanWang
  • 1,158
  • 2
  • 16
  • 30