13

According to this thread: SO: Column names to list

It should be straightforward to do convert the column names to a list. But if i do:

df.columns.tolist()

I do get:

[u'q_igg', u'q_hcp', u'c_igg', u'c_hcp']

I know, i could get rid of the u and the ' . But i would like to just get the clean names as list without any hack around. Is that possible ?

Community
  • 1
  • 1
Moritz
  • 5,130
  • 10
  • 40
  • 81

6 Answers6

23

Or, you could try:

df2 = df.columns.get_values()

which will give you:

array(['q_igg', 'q_hcp', 'c_igg', 'c_hcp'], dtype=object)

then:

df2.columns.tolist()

which gives you:

['q_igg', 'q_hcp', 'c_igg']
Pav K.
  • 2,548
  • 2
  • 19
  • 29
gincard
  • 1,814
  • 3
  • 16
  • 24
4

Simple and easy way: df-dataframe variable name

df.columns.to_list()

this will give the list of the all columns name.

Community
  • 1
  • 1
3

The list [u'q_igg', u'q_hcp', u'c_igg', u'c_hcp'] contains Unicode strings: the u indicates that they're Unicode strings and the ' are enclosed around each string. You can now use these names in any way you'd like in your code. See Unicode HOWTO for more details on Unicode strings in Python 2.x.

Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
1

If you're just interested in printing the name without an quotes or unicode indicators, you could do something like this:

In [19]: print "[" + ", ".join(df) + "]"
[q_igg, q_hcp, c_igg, c_hcp]
chrisb
  • 49,833
  • 8
  • 70
  • 70
1

As already mentioned the u means that its unicode converted. Anyway, the cleanest way would be to convert the colnames to ascii or something like that.

In [4]: cols
Out[4]: [u'q_igg', u'q_hcp', u'c_igg', u'c_hcp']

In [5]: [i.encode('ascii', 'ignore') for i in cols]
Out[5]: ['q_igg', 'q_hcp', 'c_igg', 'c_hcp'

The problem here is that you would lose special characters that are not encode in ascii.

A much more dirty solution would be to fetch the string representation of the list object and just replace the u. I would not use that but it might befit your needs in this special case ;-)

In [7]: repr(cols)
Out[7]: "[u'q_igg', u'q_hcp', u'c_igg', u'c_hcp']"
In [11]: x.replace("u", "")
Out[11]: "['q_igg', 'q_hcp', 'c_igg', 'c_hcp']"

see: https://docs.python.org/2/library/repr.html

PlagTag
  • 6,107
  • 6
  • 36
  • 48
  • 1
    Commenting on behalf of @AsheKetchum who doesn't have enough rep: The downside of `.replace` is that it might replace '**u**' if your original variables have u in their names. e.g. `"u'q_ugg'"` would become `"'q_gg'"` – Cory Klein Feb 16 '17 at 20:52
0

this will do the job

list(df2)
Omkar Darves
  • 164
  • 1
  • 5