The solution in that post assumes the data is ordered by key, which is Different to my case.
If I order the data before apply that solution, there is no more condense or efficiency than what I've already achieved.
The dataset './melb_data.csv' comes from kaggle.
This code is to draw a horizontal plotbox.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict
data = np.genfromtxt('melb_data.csv',
delimiter=',', names = True,
dtype=None, encoding=None)
tem1 = defaultdict(list)
for key, value in zip(data['Regionname'], data['Price']):
tem1[key].append(value)
data = defaultdict(list)
for key, value in tem1.items():
data["Regionname"].append(key)
data["Price"].append(value)
fig, ax = plt.subplots()
ax.boxplot(data['Price'], labels=data['Regionname'],vert=False)
plt.show()
There are 2 for loops in the code to group price
by Regionname
. I'm concerned if there is a better way to do the groupby, like some numpy methods?
I know it is easier to use pandas to do this, but for some reason, I have to do this without pandas.