5

Haven't found any answers that I could apply to my problem so here it goes:

I have an initial dataframe of images that I would like to split into two, based on the description of that image, which is a string in the "Description" column.

My problem issue is that not all descriptions are equally written. Here's an example of what I mean:

enter image description here

Some images are accelerated and others aren't. That's the criteria I want to use to split the dataset.

However even accelerated and non-accelerated image descriptions vary among them.

My strategy would be to rename every string that has "ACC" in it - this would cover all accelerated images - to "ACCELERATED IMAGE".

Then I could do:

df_Accl = df[df.Description == "ACCELERATED IMAGE"]
df_NonAccl = df[df.Description != "ACCELERATED IMAGE"]

How can I achieve this? This was just a strategy that I came up with, if there's any other more efficient way of doing this feel free to speak it.

J. Devez
  • 329
  • 2
  • 6
  • 15
  • 1
    Please try to avoid images and put some data that can be easily loaded next time.. – Franco Piccolo Nov 18 '18 at 17:58
  • Related: [Splitting a dataframe based on condition](https://stackoverflow.com/questions/52966811/splitting-a-dataframe-based-on-condition/52967219#52967219) – jpp Nov 18 '18 at 18:07

2 Answers2

8

You can use str.contains for boolean mask - then filter by boolean indexing.

For invert mask is use ~, filter rows not contains ACC:

mask = df.Description.str.contains("ACC")
df_Accl = df[mask]
df_NonAccl = df[~mask]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can use contains to find the rows that contain the substring ACC:

df['Description'].str.contains('ACC')
Franco Piccolo
  • 6,845
  • 8
  • 34
  • 52