0

I have a bunch of tasks to distribute evenly across a date range.

The task lists always contain 5 elements, excluding the final chunk, which will vary between 1 and 5 elements.

The process I've put together outputs the following data structure;

[{'Project': array([['AAC789A'],
       ['ABL001A'],
       ['ABL001D'],
       ['ABL001E'],
       ['ABL001X']], dtype=object), 'end_date': '2020-10-01'}, 
{'Project': array([['ACZ885G_MA'],
       ['ACZ885H'],
       ['ACZ885H_MA'],
       ['ACZ885I'],
       ['ACZ885M']], dtype=object), 'end_date': '2020-10-02'}, 
 {'Project': array([['IGE025C']], dtype=object), 'end_date': '2020-10-03'}]

...but I really need the following format...

Project,end_date
AAC789A,2020-10-01
ABL001A,2020-10-01
ABL001D,2020-10-01
ABL001E,2020-10-01
ABL001X,2020-10-01
ACZ885G_MA,2020-10-02
ACZ885H,2020-10-02
ACZ885H_MA,2020-10-02
ACZ885I,2020-10-02
ACZ885M,2020-10-02
IGE025C,2020-10-03

I've looked at repeating and chaining using itertools, but I don't seem to be getting anywhere with it.

This is my first time working heavily with Python. How would this typically be accomplished in Python?

This is how I'm currently attempting to do this, but I get the error below.

df = pd.concat([pd.Series(row['end_date'], row['Project'].split(','))
                    for _, row in df.iterrows()]).reset_index()


AttributeError: 'numpy.ndarray' object has no attribute 'split'
Ben Gee
  • 23
  • 5

2 Answers2

1

here you have a solution using numpy flatten method:

import pandas as pd
import numpy as np


data = [{'Project': np.array([['AAC789A'],
       ['ABL001A'],
       ['ABL001D'],
       ['ABL001E'],
       ['ABL001X']], dtype=object), 'end_date': '2020-10-01'}, 
{'Project': np.array([['ACZ885G_MA'],
       ['ACZ885H'],
       ['ACZ885H_MA'],
       ['ACZ885I'],
       ['ACZ885M']], dtype=object), 'end_date': '2020-10-02'}, 
 {'Project': np.array([['IGE025C']], dtype=object), 'end_date': '2020-10-03'}]

clean = lambda di : { 'Project': di['Project'].flatten(), 'end_date': di['end_date']}
result = pd.concat([pd.DataFrame(clean(d)) for d in data])

result is a dataframe which can be exported to a csv format. It contains the following:

Project,end_date
AAC789A,2020-10-01
ABL001A,2020-10-01
ABL001D,2020-10-01
ABL001E,2020-10-01
ABL001X,2020-10-01
ACZ885G_MA,2020-10-02
ACZ885H,2020-10-02
ACZ885H_MA,2020-10-02
ACZ885I,2020-10-02
ACZ885M,2020-10-02
IGE025C,2020-10-03
0

I found an answer that met my need. See link below - MaxU's answer served me best.

Using his explode method, I was able to accomplish my goal with one line of code.

df2 = explode(df.assign(var1=df.Project.str.split(',')), 'Project')

Split (explode) pandas dataframe string entry to separate rows

Ben Gee
  • 23
  • 5