I'm attempting to load a large data set. I have ~8k day files, each with arrays of hundreds of measurements. I can load a single day file into a set of numpy arrays, which I store in a dictionary. To load all the day files, I initialize a dictionary with the desired keys. Then I loop through the list of files, loading one, and attempt to store them in the larger dictionary.
all_measurements = np.asarray([get_n_measurements(directory, name) for name in files])
error_files = []
temp = np.full(all_measurements.sum()
all_data = {key: temp.copy(),
fill_value, dtype=np.float64) for key in sample_file}
start_index = 0
for data_file, n_measurements in zip(file_list, all_measurements):
file_data = one_file(data_file) # Load one data file into a dict.
for key, value in file_data.iteritems(): # I've tried .items(), .viewitems() as well.
try:
all_data[key][start_index : start_index + n_measurements] = file_data[key]
except ValueError, msg:
error_files.append((data_file, msg))
finally:
start_index += n_measurements
I've inspected the results of one_file()
and I know that it properly loads the data. However, the combined all_data
behaves as if every value is identical across key:value
pairs.
Here is an example of the data structures:
all_data = {'a': array([ 0.76290858, 0.83449302, ..., 0.06186873]),
'b': array([ 0.32939997, 0.00111448, ..., 0.72303435])}
file_data = {'a': array([ 0.00915347, 0.39020354]),
'b': array([ 0.8992421 , 0.18964702])}
In each iteration of the for
loop, I attempt to insert the file_data
into all_data
at the indices [start_index : start_index + n_measurements]
.