1

I am trying to create a 2D array so I can create a heatmap using matplotlib.pyplot similar to the example here: A simple categorical heatmap

I have looked at solutions here How to select rows from a DataFrame based on column values? and here Return single cell value from Pandas DataFrame, but I cannot get them to work for my purpose.

here is my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

age = np.unique(ageVehicle['Age'])
vehicle = np.unique(ageVehicle['_Vehicle_Type'])

ageVehicleType = np.array([])
innerList = np.array([])

for i in age:
    for j in vehicle:
        if len(innerList) == len(vehicle) - 1:
            innerList+=(int(ageVehicle.loc[(ageVehicle['_Vehicle_Type'] == j) & (ageVehicle['Age'] == i)]['_Count(vehicle_Type)'].values))
            ageVehicleType.append(innerList)
            innerList = np.array([])
            break
        else: 
            innerList+=(int(ageVehicle.loc[(ageVehicle['_Vehicle_Type'] == j) & (ageVehicle['Age'] == i)]['_Count(vehicle_Type)'].values))

fig, ax = plt.subplots()
im = ax.imshow(ageVehicleType)

# We want to show all ticks...
ax.set_xticks(np.arange(len(vehicle)))
ax.set_yticks(np.arange(len(age)))
# ... and label them with the respective list entries
ax.set_xticklabels(vehicle)
ax.set_yticklabels(age)

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")

fig.tight_layout()
plt.show()

My dataframe ageVehicle has 3 columns: Age, _Vehicle_Type and _Count(vehicle_Type). In the nested for loops for i ... for j: I am basically trying to build 1D arrays innerList which will be combined together in a 2D array ageVehicleType. age and vehicle lists contain the unique values of age and vehicle in my ageVehicle dataframe.

for example:

age = [8,9,10,11,12,13,14,15,16]

vehicle = ['toyota', 'bmw', 'mazda', 'benz', 'tesla']

_Count(vehicle_Type) is how many of each combinations of age and vehicle there are.

The 2D array ageVehicleType will essentially be all possible combinations of age and vehicle on dataframe ageVehicle. This 2D array will be the values to construct the colors on the heatmap.

Questions:

  1. The more important question is that I already have the counts (to use for coloring cells on heatmap) in one of the columns _Count(vehicle_Type. Is it possible, to somehow use this column in my ageVehicle dataframe to build the heatmap instead of creating the 2D array which constitutes all combinations of age and vehicle?

  2. Should the 2D array ageVehicleType necessarily be a cross-product of all combinations of age and vehicle? If so, the logic of the code may need to be altered.

  3. I am getting an error. I'd appreciate your help on how I can re-write my conditions to resolve this issue:

TypeError                                 Traceback (most recent call last)
<ipython-input-54-4e2a48f8339f> in <module>
     15         else:
     16             innerList+=(int(ageVehicle.loc[(ageVehicle['_Vehicle_Type'] == j) & (ageVehicle['Age'] == i)]\
---> 17                                  ['_Count(vehicle_Type)'].values))
     18 

TypeError: only size-1 arrays can be converted to Python scalars

Thanks in advance.

SMS
  • 49
  • 6

0 Answers0