0

I have a csv file parsed from XML, and it is now in multi-level header data structure.

The df looks like this

ID   category-1     category-2
     address1       address2
     label1         label2
1     CN             DN
      XN             YN
      XL             DL
2     CX             DN
      UC             UTC
      UX             UC

I want this data to be break into a normal dataframe

  ID   category1-address1-label1     category2-address2-label2
    1     CN                               DN
    1     XN                               YN
    1     XL                               DL
    2     CX                               DN
    2     UC                               UTC
    2     UX                               UC

I used read_csv().reset_index(), but it will lose many important information. Is there anyway I can make the multi-level header rows a normal csv df using pandas command?

  • It seems like you want the `header` argument in your `read_csv()` call, For example, `header=[0,1,2]` will give you a multiindex in the columns. Otherwise, you can use `skiprows` to skip the headers entirely and manually add them later – G. Anderson Jul 29 '21 at 21:34
  • If you go the multiindex route, [this question and answers](https://stackoverflow.com/questions/24290297/pandas-dataframe-with-multiindex-column-merge-levels) can show you how to collapse the multiindex into the format you posted – G. Anderson Jul 29 '21 at 21:36
  • I used skiprows, it has the original names, but has something like data.1, code.1, address.1 in my columns. Why there will be .1. Also, when I tried header = [0,1,2], the data itself is still hierarchical. How can I make them into each cell with one value. – LearningCode Jul 29 '21 at 22:28
  • `.1` is usually appended as an identifier if you have more than 1 column with the same name. If you [edit] to include a sample of what you're actually seeing then we might be able to provide better help. Based on your provided sample, I'm able to use `header=[0,1,2], index_col=[0]` in `read_csv` and then `df.columns=df.columns.map('-'.join).str.strip('-')` as in the linked answer above to achieve your result without any appearance of `.1` – G. Anderson Jul 30 '21 at 23:02

0 Answers0