3

I've been practicing python for a while now and just got into pandas to start learning dataframes. I understand that df.drop() will remove a column/row based on certain requirements and makes a new df. I was wondering, is there a way to assign those dropped columns/rows to a new variable for logging purposes?

import pandas as pd
L = ["a","b","c","d","a","a"]
df1 = pd.DataFrame(L)
df1.columns = ['letter']
#print(df1)

df2 = df1.drop(df1.letter == "a", axis=0)
print(df2)

 letter
2      c
3      d
4      a #why is this row not removed?
5      a #why is this row not removed?

However, this doesn't even print a new df2 where all the rows with "a" are removed (separate problem here not sure why that is happening).

Assigning the removed column to a new df doesn't work because it is using the initial dataframe df1. I am just unsure of how to make two dataframes, one with ONLY the removed columns and one where the removed columns are edited out.

I would want a df3 that prints:

letter
0      a
4      a
5      a
CuriousDude
  • 1,087
  • 1
  • 8
  • 21

2 Answers2

2

I would just select the specific rows before dropping them:

df2 = df1.loc[df1.letter == "a"]
Bestname
  • 173
  • 2
  • 10
  • I notice that this works without using loc, so df1[df1.letter == 'a'] also works. Is there a reason I need to use loc? Would there be unforeseen issues down the road? – CuriousDude Jan 12 '18 at 22:54
  • It seems that this is just for clarity. Because explicity is better to get than implicity in code. – Bestname Jan 12 '18 at 23:34
2

Create a mask for your condition. Select the rows to be removed based on the condition using boolean indexing. Then reassign df1 to by inverting the mask using ~ (not).

mask = df1['letter'] == 'a'
removed_rows = df1[mask]
df1 = df1[~mask]

>>> df1
  letter
1      b
2      c
3      d

>>> removed_rows
  letter
0      a
4      a
5      a
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • How do you create a mask when you need to drop columns using a for loop My condition is not set to a letter but rather a few calculations that determines the column to drop. I want to keep those dropped columns as a separate dataframe for comparison. Not sure how to implement mask for this. Don't even know where to start with that one. – CuriousDude Jan 15 '18 at 18:26
  • @DNAngel In general, it is best to avoid `for` loops when working with pandas dataframes. Perhaps you can ask another question providing the minimal information to create what you are trying to achieve (https://stackoverflow.com/help/mcve & https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Alexander Jan 15 '18 at 18:37