Is this a valid way of loading subsets of a dask dataframe to memory:
while i < len_df:
j = i + batch_size
if j > len_df:
j = len_df
subset = df.loc[i:j,'source_country_codes'].compute()
I read somewhere that this may not be correct because of how dask assigns index numbers because of it dividing the bigger dataframe into smaller pandas dfs. Also I don't think dask dataframes has an iloc
attribute.
I am using version 0.15.2
In terms of use cases, this would be a way of loading batches of data to deep learning (say keras).