0

G'day!

I am trying to find the best way to convert the following data from a dataframe into YAML.

Region    Country State       City      County    Address        Rep
0   AMER        USA    CA       Brea      Orange   Street 1      Mr. X
1   AMER        USA    CA       Brea      Orange   Street 2      Mr. Y
2   AMER        USA    CA       Brea      Orange   Street 3   Mrs. Doe
3   AMER        USA    CA       Brea  Buena Park   Street 1   Person 1
4   AMER        USA    CA       Brea  Buena Park   Street 2   Person 2
5   AMER        USA    CA       Brea  Buena Park   Street 3   Person 3
6   AMER        USA    NY        NYC       Bronx  Street 40   Person 4
7   EMEA         UK     -     London        Kent  Street 10  Mrs. Jane
8   APAC  Singapore     -  Singapore      Sabana  Street 20    Mrs. Xi

In pandas, I can use pivot table to get this structure

df.pivot_table(index=['Region','Country','State','City','County','Address'],aggfunc='first')


                                                             Rep
Region Country   State City      County     Address
AMER   USA       CA    Brea      Buena Park Street 1    Person 1
                                            Street 2    Person 2
                                            Street 3    Person 3
                                 Orange     Street 1       Mr. X
                                            Street 2       Mr. Y
                                            Street 3    Mrs. Doe
                 NY    NYC       Bronx      Street 40   Person 4
APAC   Singapore -     Singapore Sabana     Street 20    Mrs. Xi
EMEA   UK        -     London    Kent       Street 10  Mrs. Jane`

But how do I take this to YAML (in Pandas way!) in order to achieve the following format:

 - Region: AMER
   - Country: USA
   - State: 
    - NY:
      - City:
        - NYC:
          - County: 
             - Bronx
          - Address:
            - Street40
          - Rep: 
            - Person 4
    - CA:
      - City:
        - Brea:
        - County:
          - Orange
          - Buena Park
        - Address:
          - Street 1
          - Street 2
          - Street 3
          - Street 1
          - Street 2
          - Street 3
          - Street 4
        - Rep:
          - Mr. X
          - Mr. Y
          - Mrs. Doe
          - Person 1
          - Person 2
          - Person 3

One way I found doing this is doing df.iterrows() and then handling the data manually into dictionary and then to yaml, but I just don't like this approach - there must be more elegant 'pandas' way of doing it, if that makes sense :-)

Any help will be much appreciated.

Danail Petrov
  • 1,875
  • 10
  • 12
  • 1
    Maybe this [SO post](https://stackoverflow.com/questions/46205399/how-to-generate-n-level-hierarchical-json-from-pandas-dataframe) helps – above_c_level Mar 25 '20 at 17:29
  • Thanks fella. I am currently using very similar approach but I was wondering if there is something more elegant :-) Thanks for tipping in anyway! – Danail Petrov Mar 25 '20 at 18:39

0 Answers0