0

I have a dictionary (data_final) of dataframes (health, education, economy,...). The dataframes contain data from one xlsx file. In one of the dataframes (economy), the column names have brackets and single quotes added to it.

data_final['economy'].columns = 
Index([                                ('Sr.No.',),
                                 ('DistrictName',),
                                  ('Agriculture',),
                            ('Forestry& Logging',),
                                      ('Fishing',),
                            ('Mining &Quarrying',),
                            ('ManufacturingMFG.',),
                               ('RegisteredMFG.',),
                                 ('Unregd. MFG.',),
                   ('Electricity,Gas & W.supply',),
                                 ('Construction',),
                    ('Trade,Hotels& Restaurants',),
                                     ('Railways',),
                      ('Transportby other means',),
                                      ('Storage',),
                                ('Communication',),
                           ('Banking &Insurance',),
       ('Real, Ownership of Dwel. B.Ser.& Legal',),
                         ('PublicAdministration',),
                                ('OtherServices',),
                                     ('TotalDDP',),
                           ('Population(In '00)',),
                        ('Per CapitaIncome(Rs.)',)],
      dtype='object')

I cannot reference any column using

data_final['economy']['('Construction',)']

gives error -

SyntaxError: invalid syntax

I tried to use replace to remove the brackets -

data_final['economy'].columns = pd.DataFrame(data_final['economy'].columns).replace("(","",regex=True))

But this does not remove the error in column names. How can i remove all these special characters from column names?

Sнаđошƒаӽ
  • 16,753
  • 12
  • 73
  • 90
Rohan Bapat
  • 343
  • 2
  • 4
  • 17
  • I'm not really familiar with data frames, but `'('Construction',)'` isn't valid syntax because you can't ordinarily have quote marks inside a string literal that match the quote marks that surround it. What happens if you do `'(\'Construction\',)'` or `"('Construction',)"` instead? – Kevin Apr 26 '16 at 13:18

2 Answers2

3

It looks as though your column names are being imported/created as tuples. What happens if you try and reference them removing the brackets, but leaving a comma on the end, like so

data_final['economy']['Construction',]

or even with the brackets

data_final['economy'][('Construction',)]
Ed.
  • 344
  • 1
  • 5
  • that indeed worked!! Thanks a lot! But i can reference only one column after leaving a comma on the end - data_final['economy']['Construction',] how can i reference multiple columns.Leaving a comma doesnt work for multiple column referencing - data_final['economy']['Construction','Storage',] KeyError: ('Construction', 'Storage') – Rohan Bapat Apr 26 '16 at 13:25
  • I'm afraid I don't use dataframes to know if that is possible, but my gut says not as it looks like the columns are using python dictionaries. You may have to get the columns separately rather than all in one hit (although there may be another way of accessing multiple columns that I am unaware of). – Ed. Apr 26 '16 at 14:11
0

The syntax error should be related to the line

('Population(In '00)',),

The string contains a single quotation mark, which would usually mark the end of the string. If you want to use one in a string, you have to surround it by " of escape it as \'. Rsulting in a line like:

('Population(In \'00)',),

The same problem applies to your actual call, you have to escape the quotation mark there as well.

Klaus D.
  • 13,874
  • 5
  • 41
  • 48