0

I have a dataset with 17 features, 10K observations, and one column containing labels (ranging from 1 through 4, integers). So the dataset is 10,000 X 18 (17 features plus one label). What I want to do is create a list of arrays in which each array is created from each block of labels. For example, the first 10 rows may be labeled as 1,1,1,2,2,3,1,1,1,3. I tried to use Pandas at first by aggregating by label, but that does not work because then I will only have four arrays within the list. Any ideas on how to code this in numpy or pandas?

GK89
  • 646
  • 5
  • 29
  • It's rather difficult to get what you have and expect from your question. Please see [this guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Quang Hoang Dec 08 '20 at 22:46

1 Answers1

1

First get your labels, and then separate each block:

unique_labels = df["label_col"].unique()
label_blocks = {}
for label in unique_labels:
    block_df = df.loc[df["label_col"]==label]
    label_blocks[label] = block_df
Laggs
  • 386
  • 1
  • 5