Find Excel/CSV File's Actual Column References- Python Pandas

Question

Let say I have an excel sheet like this,

If I read this file in pandas, I can get Column1, Column2, Column3 as headers.

However, I want to know/create an output possibly a dictionary that is like this,

{Column1: 'A', Column2: 'B', Column3: 'C'}

The reason is I have another dictionary from master mapping file (that already had the references for each column done manually) that has all the references to each Column like this,

{Column1: 'A', Column2: 'B', Column3: 'C', Column4: 'D'}

This way, I can cross check keys and values and then if there is any mismatch, I can identify those mismatches. How can I get the original column name such as A for Column1 etc.. while reading a file into pandas?? Any ideas??

There is `usecols` in this `pandas` [link](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html) But does not help much in my case. — user9431057, Jul 09 '18 at 16:37

jpp · Accepted Answer · 2018-07-09T17:52:46.197

1

You can use dict with zip to map column names to letters. Assumes you have a maximum of 26 columns.

from string import ascii_uppercase

df = pd.DataFrame(np.arange(9).reshape(3, 3),
                  columns=['Column1', 'Column2', 'Column3'])

d = dict(zip(df.columns, ascii_uppercase))

print(d)

{'Column1': 'A', 'Column2': 'B', 'Column3': 'C'}

For more than 26 columns, you can adapt the itertools.product solution available in Repeating letters like excel columns?

edited Jul 09 '18 at 17:52

answered Jul 09 '18 at 16:56

jpp

159,742
34
281
339

1

Nice way to think about this. I had no idea `ascii_uppecase` existed! – user9431057 Jul 09 '18 at 17:12
I was reading `ascii_uppercase`, my understanding is I am stuck at 26 columns for now unless, there is a way to concatenate `ascii_uppercase` characters to go more than 26 columns. i.e. In excel, after `Z`, it is `AA`, `AB` etc.. – user9431057 Jul 09 '18 at 17:51

score 0 · Answer 2 · answered Jul 09 '18 at 16:51

0

You can use the Panadas rename method to replace the dataframe column names using your existing mapping dictionary:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename.html

import pandas as pd

df = pd.DataFrame({'Column1': [1, 2], 'Column2': [3, 4], 'Column3': [5, 6]})

existing_mapping = {'Column1': 'A', 'Column2': 'B', 'Column3': 'C', 'Column4': 'D'}

df = df.rename(columns=existing_mapping)

answered Jul 09 '18 at 16:51

EnTwiZle

23
4

this will not help for direct comparing though! – user9431057 Jul 09 '18 at 17:11

Find Excel/CSV File's Actual Column References- Python Pandas

2 Answers2