3

I have a dataframe that looks like

       column1    column2     column3    colum4  column5 
1      r_n_1      r_s_1       r_n_2      r_s_3   r_n_3
2      r_n_1      r_s_1       r_n_4      r_s_4   r_n_5
3      r_n_1      r_s_1       r_n_6      r_s_5   r_n_7
4      r_n_1      r_s_1       r_n_6      r_s_6   r_n_9
5      r_n_10     r_s_7       r_n_11     r_s_8   r_n_12
6      r_n_10     r_s_9       r_n_11     r_s_10  r_n_13

And I would like to merge cells in data frame so I could write in excel that could look likeenter image description here

So basically merge cells that have same value in excel. I am guessing I can use MultiIndex from pandas but I don't know how to do that.

my code to get this data frame is like.

 new_list = []
    for k1 in remove_empties_from_dict(combined_dict):
     curr_dict = remove_empties_from_dict(combined_dict)[k1]
         for k2 in curr_dict:
              curr_dict_2=curr_dict[k2]
                for k3 in curr_dict_2:
                    curr_dict_3=curr_dict_2[k3]
                       for k4 in curr_dict_3:
                            curr_dict_4=curr_dict_3[k4]
                                new_dict= {'c1': k1, 'c2': k2, 'c3': k3, 'c4': k4,'c5': curr_dict_4}
new_list.append(new_dict)
df = pd.DataFrame(new_list)
Noob Programmer
  • 698
  • 2
  • 6
  • 22

1 Answers1

1

I couldn't find a direct function that does the merging of cells with similar values so instead, I have written a code that does that.

print(df)

  column1 column2 column3 column4 column5
0   r_n_1   r_s_1   r_n_2   r_s_3   r_n_3
1   r_n_1   r_s_1   r_n_4   r_s_4   r_n_5
2   r_n_1   r_s_1   r_n_6   r_s_5   r_n_7
3   r_n_1   r_s_1   r_n_6   r_s_6   r_n_9
4  r_n_10   r_s_7  r_n_11   r_s_8  r_n_12
5  r_n_10   r_s_9  r_n_11  r_s_10  r_n_13

This is the df I have to work with. But in order to do that what I did was, I iterated it once to check which values are similar, and replaced the with a -. The reason I did not make it NoneType was because the cells below the table have a NoneType value, so the further part of the code will keep on iterating infinitely. What I did was:

for i in df.columns:
    for j in range(len(df[i])):

        for k in range(j+1,len(df[i])):
            if df[i][j]== df[i][k]:                
                df[i][k]='-' 

So now my df looks something like:

print(df)

  column1 column2 column3 column4 column5
0   r_n_1   r_s_1   r_n_2   r_s_3   r_n_3
1       -       -   r_n_4   r_s_4   r_n_5
2       -       -   r_n_6   r_s_5   r_n_7
3       -       -       -   r_s_6   r_n_9
4  r_n_10   r_s_7  r_n_11   r_s_8  r_n_12
5       -   r_s_9       -  r_s_10  r_n_13

Now that I have all unique values in the Data Frame, I will check whether the df element is a valid input or a -. And the cells that are - will be merged with its upper value. I did that by:

from openpyxl.workbook import Workbook    
exportPath = r'C:\Users\T01144\Desktop\PythonExport.xlsx'

wb= Workbook()
ws=wb.active
rowInd=1
colInd=1
colList=['-', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H','I'] # Continue if there are more columns  

for i in df.columns:
    for j in range(0,len(df[i])):
        if(df[i][j]!='-'):
            ws.cell(row=rowInd,column=colInd,value=df[i][j])            
        else:
            count=0
            for l in range(j+1,len(df[i])):
                count+=1
                if df[i][l]!='-':
                    count-=1
                    break
            ws.merge_cells(str(str(colList[colInd]+str(rowInd-1))+":"+str(colList[colInd]+str(rowInd+count))))
        rowInd+=1

    colInd+=1
    rowInd=1  

The output I have now is:

Excel Output

The entire code can be found here.

NOTE: Some of you may get this error after creating the Excel:

We found a problem with some content in 'PythonExport.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

Just ignore this error and click Yes.

Rohan Pillai
  • 917
  • 3
  • 17
  • 26
  • 1
    So for you does not work `df.set_index(['c0','c1','c2','c3']).to_excel('file.xlsx')` ? – jezrael Jul 05 '18 at 09:04
  • @jezrael No, that just arranges the columns in the specified format. It does not merge the cells that have the same value. – Rohan Pillai Jul 05 '18 at 09:09
  • Thank you so much for your effort. You really did a lot. I appreciate it. Could you please explain what do you do with indf=0? – Noob Programmer Jul 05 '18 at 09:10
  • @NoobProgrammer I am sorry, that was a variable I used to debug the code while I was writing it. It is not a part of the final code. I have removed it in the edit. – Rohan Pillai Jul 05 '18 at 09:13
  • It gives me this : A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy df[i][k]='-' and it doesnt move forward from this point. – Noob Programmer Jul 05 '18 at 09:21
  • @NoobProgrammer Have a look at this https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas. It is specified that you can switch these warnings off by `pd.options.mode.chained_assignment = None` – Rohan Pillai Jul 05 '18 at 09:23
  • It says No module named dataframe in the command line "from openpyxl.utils.dataframe import dataframe_to_rows" even though i have openpyxl installed. – Noob Programmer Jul 05 '18 at 09:57
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174402/discussion-between-rohan-pillai-and-noob-programmer). – Rohan Pillai Jul 05 '18 at 10:01