Pandas column access w/column names containing spaces

Question

If I import or create a pandas column that contains no spaces, I can access it as such:

from pandas import DataFrame

df1 = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
                 'data1': range(7)})

df1.data1

which would return that series for me. If, however, that column has a space in its name, it isn't accessible via that method:

from pandas import DataFrame

df2 = DataFrame({'key': ['a','b','d'],
                 'data 2': range(3)})

df2.data 2      # <--- not the droid I'm looking for.

I know I can access it using .xs():

df2.xs('data 2', axis=1)

There's got to be another way. I've googled it like mad and can't think of any other way to google it. I've read all 96 entries here on SO that contain "column" and "string" and "pandas" and could find no previous answer. Is this the only way, or is there something better?

score 83 · Answer 1 · answered May 28 '15 at 18:42

83

Old post, but may be interesting: an idea (which is destructive, but does the job if you want it quick and dirty) is to rename columns using underscores:

df1.columns = [c.replace(' ', '_') for c in df1.columns]

answered May 28 '15 at 18:42

AkiRoss

11,745
6
59
86

11

If you want to standardize the columns to lowercase as well, use `df1.columns = [c.lower().replace(' ', '_') for c in df1.columns]` – JAV Apr 20 '17 at 19:16
A nice way to read and cleanup a dataframe is using method chaining. Instead of using a list comprehension to set the `columns` attribute, you can use the `rename` method: `df1 = pandas.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'dat a1': range(7)}).rename(lambda x: x.replace(' ', '_'), axis=1)` – Avi Apr 11 '19 at 21:44
2

An alternative is to use the `strip()` function: `df1.columns = [c.strip() for c in df1.columns]` – Nicola Feb 09 '20 at 23:14

score 74 · Accepted Answer · edited Jan 18 '22 at 22:16

74

I think the default way is to use the bracket method instead of the dot notation.

import pandas as pd

df1 = pd.DataFrame({
    'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
    'dat a1': range(7)
})

df1['dat a1']

The other methods, like exposing it as an attribute are more for convenience.

edited Jan 18 '22 at 22:16

Henry Ecker

34,399
18
41
57

answered Dec 07 '12 at 07:36

Rutger Kassies

61,630
17
112
97

Thanks, that one shouldn't have stumped me like it did. – Brad Fair Dec 07 '12 at 14:13
Thank for the comment. I normally use dot to access my columns (df.col_name) but just know this trick to access the column names with space by using df[column name with space"]. Thx. – theteddyboy Oct 12 '16 at 10:31

score 21 · Answer 3 · answered May 14 '20 at 08:45

21

If you like to supply spaced columns name to pandas method like assign you can dictionarize your inputs.

df.assign(**{'space column': (lambda x: x['space column2'])})

answered May 14 '20 at 08:45

Abuw

221
3
3

1

This is the solution I was looking for. – quoniam May 01 '22 at 08:00

score 3 · Answer 4 · answered Mar 15 '19 at 13:27

3

While the accepted answer works for column-specification when using dictionaries or []-selection, it does not generalise to other situations where one needs to refer to columns, such as the assign method:

> df.assign("data 2" = lambda x: x.sum(axis=1)
SyntaxError: keyword can't be an expression

answered Mar 15 '19 at 13:27

Olsgaard

1,006
9
19

1

Yes, I would love a solution to this since there is no chainable alternative to `assign` that I know of. I guess this should be a separate SO question. – Avi Apr 11 '19 at 20:19
5

the answer is to pass a dictionary as a keyword argument. `df.assign(**{"data 2": lambda x: x.sum(axis=1)})` – Anders Swanson Jul 07 '20 at 19:18

score 3 · Answer 5 · answered Feb 11 '22 at 18:56

3

You can do it with df['Column Name']

answered Feb 11 '22 at 18:56

Emilio

47
5

That's a straightforward one. – Poornima Devi Jul 08 '22 at 11:23

Jochen Haßfurter · Answer 6 · 2019-12-12T07:43:16.697

-1

If you want to apply filtering, that's also possible with column names having spaces in it, e.g. filtering for NULL-values or empty strings:

df_package[(df_package['Country_Region Code'].notnull()) | 
(df_package['Country_Region Code'] != u'')]

as I figured out thanks to Rutger Kassies answer.

edited Dec 12 '19 at 07:43

answered Dec 11 '19 at 13:51

Jochen Haßfurter

875
2
13
27

this is obvious once you discover how to refer to them? – robertspierre Oct 14 '22 at 12:54

Pandas column access w/column names containing spaces

6 Answers6

Linked

Related