0

I have followed the below mentioned link for creating Sunburst chart in Python: How to make a sunburst plot in R or Python?

Attached is the notebook for reference.

However, the data is required in a specific format for function to create the chart (list nested by level). Example:

data = [
    ('/', 100, [
        ('home', 70, [
            ('Images', 40, []),
            ('Videos', 20, []),
            ('Documents', 5, []),
        ]),
        ('usr', 15, [
            ('src', 6, [
                ('linux-headers', 4, []),
                ('virtualbox', 1, []),

            ]),
            ('lib', 4, []),
            ('share', 2, []),
            ('bin', 1, []),
            ('local', 1, []),
            ('include', 1, []),
        ]),
    ]),
]
sunburst(data)

For the same example, if someone gives me a decision tree output in an excel file with node hierarchy as levels, is there a way to convert this excel output(find below) as list above so I can create Sunburst using given function.

Excel Output:

Level0,Level1,Level2,Level3,Volume
/,,,,15
/,home,Images,,40
/,home,Videos,,20
/,home,Documents,,5
/,home,,,5
/,usr,src,linux-headers,4
/,usr,src,virtualbox,1
/,usr,src,,1
/,usr,lib,,4
/,usr,share,,2
/,usr,bin,,1
/,usr,local,,1
/,usr,include,,1

4 Answers4

1

You can do it with pandas DataFrame & recursion:

import pandas as pd

def df_to_nested(dataframe, _groupby, level, col):
    """
    - dataframe: source data
    - _groupby: groupby columns
    - level: start from this level (0 by default)
    - col: value to aggregate
    """
    if len(dataframe) == 1:
        return [] # Reached max depth
    else:
        result = []
        df = dataframe.groupby(_groupby[level])
        level += 1 # Level0 -> Level1 (increase level)
        for key, val in df: # Iterate through groups
            result.append(tuple([key, val[col].sum(), df_to_nested(val, _groupby, level, col)]))
        level -= 1 # Level1 -> Level0 (decrease level)
        return result

df = pd.read_csv('test.csv') # Read your file

_groupby = ['Level0', 'Level1', 'Level2', 'Level3'] # Group by cols

result = df_to_nested(df, _groupby, 0, 'Volume')

print(result)

Sample output:

[
    ('/', 100, [
        ('home', 70, [
            ('Documents', 5, []),
            ('Images', 40, []),
            ('Videos', 20, [])
        ]),
        ('usr', 15, [
            ('bin', 1, []),
            ('include', 1, []),
            ('lib', 4, []),
            ('local', 1, []),
            ('share', 2, []),
            ('src', 6, [
                ('linux-headers', 4, []),
                ('virtualbox', 1, [])
            ])
        ])
    ])
]
Nam Nguyen
  • 11
  • 1
  • Thanks, this is a very nice solution. I just found a bug in it. When a level has only one variant, you do not include it in the output. I have fixed it and posted a new answer. If you would like you can update your answer with mine and I'd remove it. – Moh Dec 15 '18 at 20:51
1

Nam Nguyen's answer is very nice but has a minor bug in cases where at a specific level there is only one record, then this statement len(dataframe) == 1 becomes True and that one value for the specific level is not included in the result. I have updated his answer to count for that case as well:

def df_to_nested(dataframe, _groupby, level, col):
"""
- dataframe: source data
- _groupby: groupby columns
- level: start from this level (0 by default)
- col: value to aggregate
"""
result = []
if len(dataframe) == 1:        
    try:
        df = dataframe.groupby(_groupby[level])                    
        for key, val in df: # Iterate through groups                
            result.append(tuple([key, val[col].sum(), []]))
    except Exception: # Reached max depth
        pass
else:
    df = dataframe.groupby(_groupby[level])
    level += 1 # Level0 -> Level1 (increase level)
    for key, val in df: # Iterate through groups
        result.append(tuple([key, val[col].sum(), df_to_nested(val, _groupby, level, col)]))
    level -= 1 # Level1 -> Level0 (decrease level)

return result
Moh
  • 1,887
  • 3
  • 18
  • 29
1

I wanted to use the answer to this question for creating Plotly Sunburst in Javascript. However, the output was different than the data format in these examples. I wanted an output more like this answer, so I changed the code a bit and now I can use it for the Javascript in front-end.

I'll leave it here in case it was useful for someone.

def df_to_nested(dataframe, _groupby, level=0, col='count'):

    result = []

    if level == (len(_groupby) - 1):
        df = dataframe.groupby(_groupby[level])
        parent_cols = _groupby[:level]
        for key, val in df:  # Iterate through groups
            row = val.head(1)
            parents = "-".join(str(list(row[p])[0]) for p in parent_cols)
            
            result.append({'labels':key,'values':int(val[col].sum()),'parents':parents,
                           'ids':parents+"-"+str(key)})

    else:
        df = dataframe.groupby(_groupby[level])
        parent_cols = _groupby[:level]
        level += 1  # Level0 -> Level1 (increase level)
        for key, val in df:  # Iterate through groups
            if level==1:
                parents = ""
                ids=key
            else:
                row = val.head(1)
                parents = "-".join(str(list(row[p])[0]) for p in parent_cols)
                ids = parents+"-"+str(key)

            result.append({'labels':key,'values':int(val[col].sum()),'parents':parents,
           'ids':ids})
            result.extend(df_to_nested(val, _groupby, level, col))

        level -= 1  # Level1 -> Level0 (decrease level)

    return result

def get_sunburst_format(df,path):
    # path is the list of columns in dataframe that you want a sunburst from
    tmp = df.copy()
    tmp['count'] = 1
    sunburst_data = df_to_nested(tmp,path)
    sunburst_data = pd.DataFrame(sunburst_data)
    return {column:list(sunburst_data[column]) for column in sunburst_data.columns}
mishoolak
  • 43
  • 6
0

First of all, thank Nam Nguyen and MOH for their wonderful answers. Inspired by them, I revised the code again.
Only one line of samples can reach max depth.
I hope this answer is helpful to you.

import pandas as pd

def df_to_nested(dataframe, _groupby, level, col):
    """
    - dataframe: source data
    - _groupby: groupby columns
    - level: start from this level (0 by default)
    - col: value to aggregate
    """
    result = []
    if len(dataframe) == 1:
        try:
            df = dataframe.groupby(_groupby[level])                    
            level += 1  
            for key, val in df:
                result.append(tuple([key, val[col].sum(), df_to_nested(val, _groupby, level, col)]))
        except:  # Reached max depth
            pass
    else:
        df = dataframe.groupby(_groupby[level])
        level += 1  
        for key, val in df: 
            result.append(tuple([key, val[col].sum(), df_to_nested(val, _groupby, level, col)]))
    return result

df=pd.read_table('nest.txt',sep=',')
_groupby = ['Level0', 'Level1', 'Level2', 'Level3'] # Group by cols

result = df_to_nested(df, _groupby, 0, 'Volume')
result
Piers
  • 1
  • 1