0

I have a dataframe like as shown below

Date,cust,region,Abr,Number,,,dept
12/01/2010,Company_Name,Somecity,Chi,36,136,NaN,sales
12/02/2010,Company_Name,Someothercity,Nyc,156,NaN,41,mfg

tf = pd.read_clipboard(sep=',')

I am trying to do some manipulation to the excel file

writer = pd.ExcelWriter('duck_data.xlsx',engine='xlsxwriter')
for (cust,reg), v in df.groupby(['cust','region']):
    v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}",index=False, columns = modified_col_list)
writer.save()

but the problem is when the file is written, it uses unnamed:5, unnamed:6 to represent empty column names. So, I created a modified_col_list as shown below and passed it as input to to_excel function

ordiginal_col_list = ['Date','cust','region','Abr','Number',nan,nan,'dept']

modified_col_list = ['Date','cust','region','Abr','Number',' ',' ','dept']

But my objective is to have empty column names as is during excel write itself. But this resulted in below error

KeyError: "Not all names specified in 'columns' are found"

I expect my output to be like as shown below (you can see the column names are empty)

enter image description here

The Great
  • 7,215
  • 7
  • 40
  • 128
  • when I write the excel file, I would like to leave it as is (empty column names) – The Great Mar 02 '22 at 04:42
  • @LeiYang - There are like 150 columns. Do I have to create a list of 150 column names? – The Great Mar 02 '22 at 04:52
  • use loops, of course. i believe you can do that. or [Create list of single item repeated N times](https://stackoverflow.com/a/3459131/1518100) – Lei Yang Mar 02 '22 at 04:52
  • @LeiYang when I use a list as shown above in my post, I get this error - `KeyError: "Not all names specified in 'columns' are found"` – The Great Mar 02 '22 at 07:18
  • i was making a mistake here. `columns` parameter in `to_excel` is to filter which columns **in df** to export to excel. – Lei Yang Mar 02 '22 at 07:34

1 Answers1

4

You can use:

writer = pd.ExcelWriter('duck_data.xlsx',engine='xlsxwriter')
for (cust,reg), v in df.groupby(['cust','region']):
    #if columns name has `Unnamed` replace by empty string
    #v.columns = ['' if 'Unnamed' in x else x for x in v.columns]
    #if columnshas missing values replace them to empty string
    v.columns = v.columns.to_series().fillna('')
    #removed columns parameter
    v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}",index=False)
writer.save()

Another idea is change columns names before groupby:

#if columns name has `Unnamed` replace by empty string
#df.columns = ['' if 'Unnamed' in x else x for x in df.columns]
#if columnshas missing values replace them to empty string
df.columns = df.columns.to_series().fillna('')

writer = pd.ExcelWriter('duck_data.xlsx',engine='xlsxwriter')
for (cust,reg), v in df.groupby(['cust','region']):
    #removed columns parameter
    v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}",index=False)
writer.save()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I got an error `TypeError: argument of type 'numpy.float64' is not iterable`. I checked `type(df.columns)`, it returns Index. So, I changed it to list df.columns.tolist() and then use list comprehension loop but still the same error – The Great Mar 02 '22 at 07:25
  • @TheGreat - can you test `v.columns = ['' if 'Unnamed' in x else x for x in v.columns.to_series().fillna('')]` ? – jezrael Mar 02 '22 at 07:28
  • @TheGreat - Or simplier `v.columns = v.columns.to_series().fillna('')` – jezrael Mar 02 '22 at 07:33
  • but column should not have missing values right? Because pandas by default fill missing values with `unnamed:..` when read the input file – The Great Mar 02 '22 at 07:34
  • 1
    @TheGreat - yes, you can write string `' '` instead empty string. But because 2 columns `' ', ' '` then pandas read excel like `' ', ' '.1` - deduplicate columns with same values. – jezrael Mar 02 '22 at 07:37
  • 1
    @TheGreat - if empty string then still necessary replacement `Unnamed` - pandas dont like empty columnsnames ;) – jezrael Mar 02 '22 at 07:38