I have a dataset with 17 features, 10K observations, and one column containing labels (ranging from 1 through 4, integers). So the dataset is 10,000 X 18 (17 features plus one label). What I want to do is create a list of arrays in which each array is created from each block of labels. For example, the first 10 rows may be labeled as 1,1,1,2,2,3,1,1,1,3. I tried to use Pandas at first by aggregating by label, but that does not work because then I will only have four arrays within the list. Any ideas on how to code this in numpy or pandas?
Asked
Active
Viewed 316 times
1 Answers
1
First get your labels, and then separate each block:
unique_labels = df["label_col"].unique()
label_blocks = {}
for label in unique_labels:
block_df = df.loc[df["label_col"]==label]
label_blocks[label] = block_df

Laggs
- 386
- 1
- 5