I´m kind of new to Python, and I am trying to convert a list comprehension (Hands-on Data Analysis with Pandas by S.Molin) into a "normal" for loop, just for the mere purpose of practising.
Initially, the data comes from a CSV file and is loaded using Numpy. The result is each CSV row as a single array (void type) as follows:
array([('2018-10-13 11:10:23.560', '262km NW of Ozernovskiy, Russia', 'mww', 6.7, 'green', 1), ('2018-10-13 04:34:15.580', '25km E of Bitung, Indonesia', 'mww', 5.2, 'green', 0), ('2018-10-13 00:13:46.220', '42km WNW of Sola, Vanuatu', 'mww', 5.7, 'green', 0), ('2018-10-12 21:09:49.240', '13km E of Nueva Concepcion, Guatemala', 'mww', 5.7, 'green', 0), ('2018-10-12 02:52:03.620', '128km SE of Kimbe, Papua New Guinea', 'mww', 5.6, 'green', 1)], dtype=[('time', '<U23'), ('place', '<U37'), ('magType', '<U3'), ('mag', '<f8'), ('alert', '<U5'), ('tsunami', '<i4')])
What I am trying is to alter it so that I get each column as an array of values, whose keys are the name of the columns:
{'time': array(['2018-10-13 11:10:23.560', '2018-10-13 04:34:15.580','2018-10-13 00:13:46.220', '2018-10-12 21:09:49.240', '2018-10-12 02:52:03.620'], dtype='<U23'), 'place': array(['262km NW of Ozernovskiy, Russia', '25km E of Bitung, Indonesia', '42km WNW of Sola, Vanuatu','13km E of Nueva Concepcion, Guatemala','128km SE of Kimbe, Papua New Guinea'], dtype='<U37'), 'magType': array(['mww', 'mww', 'mww', 'mww', 'mww'], dtype='<U3'), 'mag': array([6.7, 5.2, 5.7, 5.7, 5.6]), 'alert': array(['green', 'green', 'green', 'green', 'green'], dtype='<U5'), 'tsunami': array([1, 0, 0, 0, 1])}
The List comprehension used for this purpose is:
array_dict = {col: np.array([row[i] for row in data]) for i, col in enumerate(data.dtype.names)}
The solution I got so far is:
d ={}
for i,col in enumerate(data.dtype.names):
for row in data:
d[col].append(row[i])
I get the following error:
*---------
KeyError Traceback (most recent call last)
Input In [51], in <cell line: 2>()
2 for i,col in enumerate(data.dtype.names):
3 for row in data:
----> 4 d[col].append(row[i])
KeyError: 'time'*
I have researched a bit online and it could be related to the data type column "time". My guess, but I am pretty sure I am wrong, is that in the list comprehension each column is created as NumPy array directly, whereas here I am not setting it to be as such beforehand (and hence the problem with the data type).
Any help would be highly appreciated. Many thanks!