4

This seems relatively straight forward, but I have yet to find a duplicate that answers my question, or a method with the needed functionality.

I have an Excel spreadsheet where each column that contains data has a unique header. I would like to use pandas to get the letter key of the column by passing this header string into a function.

For example, if I pass "Output Parameters" to the function, I would like to return "M":

enter image description here

The closest thing that I have found is the xlsxwriter.utility.xl_col_to_name(index) method outlined in the second answer in Convert spreadsheet number to column letter

This is very similar to what I am trying to do, however the column numbers within my sheets will not remain constant (unlike the headers, which will). This being said, a method that can return the column number based on the header would also work, as I would then be able to apply the above.

Does pandas or xlsxwriter have a method that can handle the above case?

ChaddRobertson
  • 605
  • 3
  • 11
  • 30
  • What is your use case? If you load a pandas dataframe in from the middle of a sheet, eg using `usecols` to select the columns and `skiprows` to omit some rows, would you want to return the oiriginal column letters? Or just treat the first column in the dataframe as column "A" – psychemedia Sep 29 '21 at 08:42
  • @psychemedia Yes, the original column letters are needed. The point of the spreadsheet is to act as a 'build your own manufacturing calculator', and should thus be as customizable as possible (therefore, very little is assumed static, making omission difficult). The answer below is exactly what was needed. – ChaddRobertson Sep 29 '21 at 08:50
  • The selected answer doesn't do that... it predicts what the column letter is from a dataframe if the dataframe were saved to a new Excel file. If the dataframe is created from a spreadsheet loaded with `usecols=["B:E"]`, *xlsxwriter* will identify the first col as `"A"` as it would be if the dataframe were written to a new spreadsheet. – psychemedia Sep 29 '21 at 08:56
  • @psychemedia I'm not understanding the difference between the two in that case - could you please elaborate if you have time? I have implemented the below solution and the output is as expected. Perhaps my original question was worded poorly. – ChaddRobertson Sep 29 '21 at 08:59
  • 1
    It is possible to load data into a dataframe from the middle of a sheet. For example, a table may appear in a spreadsheet starting at D1, not A1, and spanning cols D:G. The `usecols=D:G` params would let you load in just those cols. If you then saved the df, or used `xlsxwriter.utility.xl_col_to_name` on the dataframe, it would return as a sheet with cols A:D. I was trying to clarify if you needed to identify the columns from the original spreadsheet. – psychemedia Sep 29 '21 at 09:27
  • 1
    I't' also worth noting that all the `xlsxwriter` utility function is doing it taking a column index number retrieved from the dataframe and converting it to a column name form: https://github.com/jmcnamara/XlsxWriter/blob/550b9c5bd678c861dcc9f6f4072b33a69566e065/xlsxwriter/utility.py#L70 – psychemedia Sep 29 '21 at 09:33
  • @psychemedia I'm understanding now, thank you. So that doesn't have an impact on what I'm trying to do, as the original spreadsheet and the dataframe I am reading have the same column positions. Though, you have brought up a very important consideration for anybody who may stumble across this question in the future. Thank you for the fantastic clarification. – ChaddRobertson Sep 29 '21 at 09:44

1 Answers1

7

You can try:

import xlsxwriter
col_no = df.columns.get_loc("col_name")
print(xlsxwriter.utility.xl_col_to_name(col_no))
Muhammad Hassan
  • 4,079
  • 1
  • 13
  • 27