Merge certain rows in a DataFrame based on startswith

Question

I have a DataFrame, in which I want to merge certain rows to a single one. It has the following structure (values repeat)

Index   Value
1      date:xxxx
2      user:xxxx
3      time:xxxx
4      description:xxx1
5      xxx2
6      xxx3
7      billed:xxxx
...

Now the problem is, that the columns 5 & 6 still belong to the description and were separated just wrong (whole string separated by ","). I want to merge the "description" row (4) with the values afterwards (5,6). In my DF, there can be 1-5 additional entries which have to be merged with the description row, but the structure allows me to work with startswith, because no matter how many rows have to be merged, the end point is always the row which starts with "billed". Due to me being very new to python, I haven´t got any code written for this problem yet.

My thought is the following (if it is even possible):

Look for a row which starts with "description" → Merge all the rows afterwards till reaching the row which starts with "billed", then stop (obviosly we keep the "billed" row) → Do the same to each row starting with "description"

New DF should look like:

Index   Value
1      date:xxxx
2      user:xxxx
3      time:xxxx
4      description:xxx1, xxx2, xxx3
5      billed:xxxx
...

[jfyi](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). could you show expected `df`? — Danila Ganchar, Sep 03 '20 at 18:04
I think is not possible using `pandas`. Only through loops/iterations. — Danila Ganchar, Sep 04 '20 at 11:13
Could you help me by giving me a structure of such a loop/iteration? — Artem Vorontsov, Sep 04 '20 at 11:19

score 0 · Answer 1 · answered Sep 04 '20 at 11:57

df = pd.DataFrame.from_dict({'Value': ('date:xxxx', 'user:xxxx', 'time:xxxx', 'description:xxx', 'xxx2', 'xxx3', 'billed:xxxx')})
records = []
description = description_val = None

for rec in df.to_dict('records'):  # type: dict
    # if previous description and record startswith previous description value
    if description and rec['Value'].startswith(description_val):
        description['Value'] += ', ' + rec['Value']  # add record Value into previous description
        continue
    # record with new description...
    if rec['Value'].startswith('description:'):
        description = rec
        _, description_val = rec['Value'].split(':')
    elif rec['Value'].startswith('billed:'):
        # billed record - remove description value
        description = description_val = None

    records.append(rec)

print(pd.DataFrame(records))

#                          Value
# 0                    date:xxxx
# 1                    user:xxxx
# 2                    time:xxxx
# 3  description:xxx, xxx2, xxx3
# 4                  billed:xxxx

Merge certain rows in a DataFrame based on startswith

1 Answers1