Extracting specific selected columns to new DataFrame as a copy

Question

I have a pandas DataFrame with 4 columns and I want to create a new DataFrame that only has three of the columns. This question is similar to: Extracting specific columns from a data frame but for pandas not R. The following code does not work, raises an error, and is certainly not the pandas way to do it.

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new = pd.DataFrame(zip(old.A, old.C, old.D)) 
# raises TypeError: data argument can't be an iterator

What is the pandas way to do it?

score 665 · Accepted Answer · edited Dec 21 '16 at 23:39

665

There is a way of doing this and it actually looks similar to R

new = old[['A', 'C', 'D']].copy()

Here you are just selecting the columns you want from the original data frame and creating a variable for those. If you want to modify the new dataframe at all you'll probably want to use .copy() to avoid a SettingWithCopyWarning.

An alternative method is to use filter which will create a copy by default:

new = old.filter(['A','B','D'], axis=1)

Finally, depending on the number of columns in your original dataframe, it might be more succinct to express this using a drop (this will also create a copy by default):

new = old.drop('B', axis=1)

edited Dec 21 '16 at 23:39

maxymoo

35,286
11
92
119

answered Jan 08 '16 at 17:51

johnchase

13,155
6
38
64

54

A caution if just copying one column: In `old[['A']].copy()`, the double square brackets are required to create a new data frame. Note that `old['A'].copy()` will only create a Series. – intotecho Feb 01 '19 at 02:18
Thank you for this amazing explanation. I was having a lot of trouble with creating new columns (after creating a new dataframe without the copy method). Stumped me. You answer finally helped me get to the bottom of it. – Ejaz Ahmed Feb 08 '23 at 12:42

score 59 · Answer 2 · answered Jun 11 '19 at 18:09

59

The easiest way is

new = old[['A','C','D']]

.

answered Jun 11 '19 at 18:09

stidmatt

1,629
13
15

13

This isn't making a copy unless you explicitly call .copy() – Sylvain Oct 30 '19 at 02:23
this copies by default. – Nguai al Feb 05 '20 at 06:49
8

@Nguaial the behaviour of simple indexing is not specified. You will not know if you get a copy or a view. See documentation for more details: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy – Ole Fass May 05 '20 at 09:38
3

As mentioned in the comment above, this will create a view and not a copy. – le_llama Jun 22 '21 at 08:57

score 20 · Answer 3 · edited Feb 25 '20 at 15:19

20

Another simpler way seems to be:

new = pd.DataFrame([old.A, old.B, old.C]).transpose()

where old.column_name will give you a series. Make a list of all the column-series you want to retain and pass it to the DataFrame constructor. We need to do a transpose to adjust the shape.

In [14]:pd.DataFrame([old.A, old.B, old.C]).transpose()
Out[14]: 
   A   B    C
0  4  10  100
1  5  20   50

edited Feb 25 '20 at 15:19

MarredCheese

17,541
8
92
91

answered Jan 15 '19 at 06:50

Hit

239
2
5

1

works, but not if column_name has special characters. – jimh May 03 '19 at 08:32
oh had not thought of that – Hit May 03 '19 at 11:23
@jimh in that case you can do old['column_name'] I believe – Liz Jan 09 '23 at 18:33
@Liz yes, but that is not in the solution – jimh Jan 09 '23 at 20:20

score 15 · Answer 4 · answered Sep 24 '19 at 09:05

15

columns by index:

# selected column index: 1, 6, 7
new = old.iloc[: , [1, 6, 7]].copy()

answered Sep 24 '19 at 09:05

sailfish009

2,561
1
24
31

score 7 · Answer 5 · answered Jun 11 '19 at 17:45

7

As far as I can tell, you don't necessarily need to specify the axis when using the filter function.

new = old.filter(['A','B','D'])

returns the same dataframe as

new = old.filter(['A','B','D'], axis=1)

answered Jun 11 '19 at 17:45

Ellen

111
1
2

score 7 · Answer 6 · answered Dec 14 '21 at 14:05

7

As an alternative:

new = pd.DataFrame().assign(A=old['A'], C=old['C'], D=old['D'])

answered Dec 14 '21 at 14:05

Dimitris Paraschakis

533
6
9

score 6 · Answer 7 · edited Apr 08 '19 at 11:15

6

Generic functional form

def select_columns(data_frame, column_names):
    new_frame = data_frame.loc[:, column_names]
    return new_frame

Specific for your problem above

selected_columns = ['A', 'C', 'D']
new = select_columns(old, selected_columns)

edited Apr 08 '19 at 11:15

Jeril

7,858
3
52
69

answered Apr 08 '19 at 11:04

Deslin Naidoo

61
1
1

score 0 · Answer 8 · answered Jan 24 '20 at 15:41

0

If you want to have a new data frame then:

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new=  old[['A', 'C', 'D']]

answered Jan 24 '20 at 15:41

Ali.E

115
1
9

1

Dangerous; this isn't making a copy. – Pranab Dec 05 '21 at 07:55
What's the diffrence between copy and copy of a slice of Dataframe? – s.paszko Jul 12 '22 at 14:09

score 0 · Answer 9 · answered Oct 25 '21 at 23:17

0

You can drop columns in the index:

df = pd.DataFrame({'A': [1, 1], 'B': [2, 2], 'C': [3, 3], 'D': [4, 4]})

df[df.columns.drop(['B', 'C'])]

or

df.loc[:, df.columns.drop(['B', 'C'])]

Output:

   A  D
0  1  4
1  1  4

answered Oct 25 '21 at 23:17

Mykola Zotko

15,583
3
71
73

score 0 · Answer 10 · answered Mar 16 '23 at 07:53

You can also use get() to create a new copy (that doesn't run into SettingWithCopyWarning).

new = old.get(['A', 'C', 'D'])

Also, filter selects by column labels by default, so the following works.

new = old.filter(['A', 'C', 'D'])

axis= is needed if one needs to select by row. For example, old.filter([0], axis=0) selects the first row.

If new is an already existing dataframe, then assign() also works (if you want to keep the old columns with their original column names).

new = pd.DataFrame()
new = new.assign(**old[['A', 'C', 'D']])

Extracting specific selected columns to new DataFrame as a copy

10 Answers10

Linked

Related