0

Simple question, first time i am using pandas, but still could not find an answer googling.

I have this dataframe info.

info.head
artist in_train new_filename
0 Barnett Newman True 102257.jpg
1 Barnett Newman True 75232.jpg
2 kiri nichol False 32145.jpg
.
.
.

Now i want to create two new dataframes and adding rows from info to them depending on if the column in_train is True or False.

train_info = pd.DataFrame(columns=('artist', 'filename'))
test_info = pd.DataFrame(columns=('artist', 'filename'))

for index, row in info.iterrows():
  if row["in_train"] == True:
    train_info.append(row[["artist", "new_filename"]])
  else:
    test_info.append(row[["artist", "new_filename"]])

For some reason this code assigned all rows in info to test_info and nothing to train_info. How do i solve this?

JKnecht
  • 231
  • 2
  • 16

1 Answers1

2

iterrows is generally not a good idea in pandas, you can solve it quite simply with masking

df = pd.DataFrame({'artist':np.random.randint(0,10,10),
                   'in_train':np.random.randint(0,2,10).astype(bool),
                   'new_filename':np.random.rand(10)})

train_info = df[df['in_train']].drop(columns='in_train').reset_index(drop=True)
test_info = df[~df['in_train']].drop(columns='in_train').reset_index(drop=True)
meTchaikovsky
  • 7,478
  • 2
  • 15
  • 34