De-aggregating a string-valued field in Pandas

Question

Is it possible to de-aggregate (or perhaps de-concatenate) a Pandas data frame based on a string field that contains a delimited list of items that you'd now like to see broken out? Similar to converting a dataset into 1NF if you were doing relational data modeling.

For example, is there a straightforward way to take this:

Name    Projects
Joe     ProjectA, ProjectB
Mary    ProjectC, ProjectD, Project E

and convert it into this:

Name    Projects
Joe     ProjectA
Joe     ProjectB
Mary    ProjectC
Mary    ProjectD
Mary    ProjectE

Wrong dup... finding better one – piRSquared Mar 08 '18 at 00:48 — piRSquared, Mar 08 '18 at 00:48

score 1 · Answer 1 · answered Mar 08 '18 at 00:43

1

Here is one way.

df = pd.DataFrame({'Name': ['Joe', 'Mary'],
                   'Projects': [['ProjectA', 'ProjectB'],
                                ['ProjectC', 'ProjectD', 'ProjectE']]})

lens = list(map(len, df['Projects'].values))

res = pd.DataFrame({'name': np.repeat(df['Name'], lens),
                    'Projects': np.concatenate(df['Projects'].values)})

#    Projects  name
# 0  ProjectA   Joe
# 0  ProjectB   Joe
# 1  ProjectC  Mary
# 1  ProjectD  Mary
# 1  ProjectE  Mary

answered Mar 08 '18 at 00:43

jpp

159,742
34
281
339

Nice! I didn't know about the `repeat` function for cloning out records. However in my original example the Projects field is string valued, so that e.g. the first record's value would be `'ProjectA, ProjectB'`. Would you just create a shadow field that splits this string out into a list value, and then apply the above approach? – Ramon Mar 08 '18 at 00:51

De-aggregating a string-valued field in Pandas

1 Answers1