4

I want to be able to concatenate string on several lines into one according to an ID. I use the library pandas (python 3).

val   id
Cat   1
Tiger 2
Ball  3
Bat   1
bill  2
dog   1

l = []
a = 0
while a < lendata:
    if df["id"][a] == 1:
        if a != 0:
            df["val"][tmp] = ' '.join(l)
            l = []
        tmp = a
        l.append(df["val"][a])
    else:
        l.append(df["val"][a])
    a += 1

It works with loops. i need this result,

val
Cat Tiger Ball
Bat bill
dog

not a group by

Question: Do you know how to do it with pandas functions? Thanks.

jb255
  • 43
  • 4
  • 4
    IIUC `df.groupby('id').val.apply(' '.join).reset_index()` – piRSquared May 28 '19 at 13:32
  • 1
    Btw, I understand you probably just got into `pandas`, but use of nested `for/if` loops is slow compared to the available pandas function. Invest some time to learn pandas and for 90% of your problems, pandas provides a solution for them in terms of functions/methods. – Erfan May 28 '19 at 13:36
  • @piRSquared we misunderstood, check his expected output – Erfan May 28 '19 at 13:43
  • 1
    @jb255 You should probably explain why the particular rows of `id` define a group. I'm guessing the consecutive nature is what does it. – piRSquared May 28 '19 at 13:45
  • 1
    Welcome to StackOverflow! I think you can now understand why it is essential to show the expected result and to provide a true [mcve] that others can just copy and paste :-) – Serge Ballesta May 28 '19 at 13:51

3 Answers3

8

Staying in pandas:

df['group'] = (df['id'] == 1).cumsum()
df.groupby('group')['val'].apply(' '.join).reset_index()
   id             val
0   1  Cat Tiger Ball
1   2        Bat bill
2   3             dog

The first line defines groups according to your definition. The second line is a standard groupby operation.

Erfan
  • 40,971
  • 8
  • 66
  • 78
IanS
  • 15,771
  • 9
  • 60
  • 84
  • 1
    You mind if I add dataframe to your answer? I was reproducing the same answer might as well add it to yours – Erfan May 28 '19 at 13:56
  • 1
    Or `df.groupby((df['id'] == 1).cumsum())['val'].apply(' '.join).reset_index()` – jezrael May 28 '19 at 14:05
  • 2
    I actually had exactly that answer, but it was too close to IanS his answer to post as a new answer @jezrael – Erfan May 28 '19 at 14:06
3

You can also create an array like so :

a = np.array(range(len(df)))

Then you create a third column which equals to your id minus the previous array. This third column will show you which val are together.

df['regroup'] = df['id'].subtract(a)

Out:

id  val regroup
0   1   Cat 1
1   2   Tiger   1
2   3   Ball    1
3   1   Bat -2

You can now use a group by to have your desired output :

In [1] : df.groupby(['regroup'])['val'].apply(' '.join)
Out[1] : regroup
-2               Bat 
 1    Cat Tiger Ball 
vlemaistre
  • 3,301
  • 13
  • 30
2

With np.split

  • Use np.diff and find where those differences are less than zero
  • np.split the val column at those positions

[*map(' '.join, np.split(df.val, np.flatnonzero(np.diff(df.id) < 0) + 1))]

['Cat Tiger Ball', 'Bat']

pd.Series([*map(' '.join, np.split(df.val, np.flatnonzero(np.diff(df.id) < 0) + 1))])

0    Cat Tiger Ball
1               Bat
dtype: object

Combined Wisdom

Using IanS's idea to check where id is equal to 1

[*map(' '.join, np.split(df.val, np.flatnonzero(df.id == 1)[1:]))]
piRSquared
  • 285,575
  • 57
  • 475
  • 624