concatenate several lines in one with a delimiter with pandas

Question

I want to be able to concatenate string on several lines into one according to an ID. I use the library pandas (python 3).

val   id
Cat   1
Tiger 2
Ball  3
Bat   1
bill  2
dog   1

l = []
a = 0
while a < lendata:
    if df["id"][a] == 1:
        if a != 0:
            df["val"][tmp] = ' '.join(l)
            l = []
        tmp = a
        l.append(df["val"][a])
    else:
        l.append(df["val"][a])
    a += 1

It works with loops. i need this result,

val
Cat Tiger Ball
Bat bill
dog

not a group by

Question: Do you know how to do it with pandas functions? Thanks.

Btw, I understand you probably just got into `pandas`, but use of nested `for/if` loops is slow compared to the available pandas function. Invest some time to learn pandas and for 90% of your problems, pandas provides a solution for them in terms of functions/methods. — Erfan, May 28 '19 at 13:36
@jb255 You should probably explain why the particular rows of `id` define a group. I'm guessing the consecutive nature is what does it. — piRSquared, May 28 '19 at 13:45
Welcome to StackOverflow! I think you can now understand why it is essential to show the expected result and to provide a true [mcve] that others can just copy and paste :-) — Serge Ballesta, May 28 '19 at 13:51

score 8 · Accepted Answer · edited May 28 '19 at 14:05

8

Staying in pandas:

df['group'] = (df['id'] == 1).cumsum()
df.groupby('group')['val'].apply(' '.join).reset_index()

   id             val
0   1  Cat Tiger Ball
1   2        Bat bill
2   3             dog

The first line defines groups according to your definition. The second line is a standard groupby operation.

edited May 28 '19 at 14:05

Erfan

40,971
8
66
78

answered May 28 '19 at 13:52

IanS

15,771
9
60
84

1

You mind if I add dataframe to your answer? I was reproducing the same answer might as well add it to yours – Erfan May 28 '19 at 13:56
1

Or `df.groupby((df['id'] == 1).cumsum())['val'].apply(' '.join).reset_index()` – jezrael May 28 '19 at 14:05
2

I actually had exactly that answer, but it was too close to IanS his answer to post as a new answer @jezrael – Erfan May 28 '19 at 14:06

vlemaistre · Answer 2 · 2019-05-28T14:23:48.833

3

You can also create an array like so :

a = np.array(range(len(df)))

Then you create a third column which equals to your id minus the previous array. This third column will show you which val are together.

df['regroup'] = df['id'].subtract(a)

Out:

id  val regroup
0   1   Cat 1
1   2   Tiger   1
2   3   Ball    1
3   1   Bat -2

You can now use a group by to have your desired output :

In [1] : df.groupby(['regroup'])['val'].apply(' '.join)
Out[1] : regroup
-2               Bat 
 1    Cat Tiger Ball

edited May 28 '19 at 14:23

answered May 28 '19 at 14:04

vlemaistre

3,301
13
30

2

Interesting solution, quite creative! – Erfan May 28 '19 at 14:07
2

Quite elegant! You could also do it in one line: `df['id'].subtract(np.arange(len(df)))`... – IanS May 28 '19 at 14:10

piRSquared · Answer 3 · 2019-05-28T13:58:53.873

2

With `np.split`

Use np.diff and find where those differences are less than zero
np.split the val column at those positions

[*map(' '.join, np.split(df.val, np.flatnonzero(np.diff(df.id) < 0) + 1))]

['Cat Tiger Ball', 'Bat']

pd.Series([*map(' '.join, np.split(df.val, np.flatnonzero(np.diff(df.id) < 0) + 1))])

0    Cat Tiger Ball
1               Bat
dtype: object

Combined Wisdom

Using IanS's idea to check where id is equal to 1

[*map(' '.join, np.split(df.val, np.flatnonzero(df.id == 1)[1:]))]

edited May 28 '19 at 13:58

answered May 28 '19 at 13:48

piRSquared

285,575
57
475
624

1

You're going to scare newcomers! – IanS May 28 '19 at 13:52
1

Should we add "Don't contribute scary answers" to the 'Welcome Wagon' policy? @IanS – piRSquared May 28 '19 at 13:54
On the other hand it's always nice to watch some nice numpy-fu ;) – IanS May 28 '19 at 13:57
@jb255 Don't feel any obligation to accept my answer because I was first. If IanS's answer suits you better, please accept his. If you like my answer better than by all means do what you like (-: – piRSquared May 28 '19 at 14:19

concatenate several lines in one with a delimiter with pandas

3 Answers3

With np.split

Combined Wisdom

With `np.split`