combine all columns in Dataframe (pandas) - python3

Question

I am trying to combine all data from all columns into one column. Like this:

A B C                X
1 2 3               123
3 2 1      -->      321 
5 6 7               567

Bearing in mind that we don't know the number of columns and rows.

I tried to solve it like this, but It doesn't work.

db.assign(sum = db.apply(''.join, axis = 1)).drop([db.index], axis = 1)

Thanks in advance

score 4 · Accepted Answer · answered Dec 19 '20 at 03:05

4

Does this work?

df.astype(str).sum(axis=1).astype(int)

Given:

df = pd.DataFrame({'A':[1,3,5],'B':[2,2,6],'C':[3,1,7]})

Results:

   A  B  C    X
0  1  2  3  123
1  3  2  1  321
2  5  6  7  567

answered Dec 19 '20 at 03:05

Scott Boston

147,308
15
139
187

Cool one (already upvoted) but string concatenation is slow, this builds a new string for each concatenation. May not be efficient for large dfs – Ch3steR Dec 19 '20 at 04:33
@Chester I agree! There are more efficient ways. – Scott Boston Dec 19 '20 at 04:39
1

`df.astype(str).agg(''.join, axis=1)` might be faster, sum performs pairwise concatenation (since strings are immutable). – cs95 Dec 19 '20 at 05:59

Ch3steR · Answer 2 · 2020-12-19T04:27:59.123

3

We can use np.geomspace with df.mul then df.sum

c = df.shape[1]
end = 10**(c-1)
df['X'] = df.mul(np.geomspace(end, 1, num=c)).sum(1)

   A  B  C      X
0  1  2  3  123.0
1  3  2  1  321.0
2  5  6  7  567.0

Timeit analysis:

Benchmarking setup:

vals = np.random.randint(2, 10, (100_000, 6))
df = pd.DataFrame(vals)

In [93]: %%timeit
    ...: df.astype(str).sum(axis=1).astype(int)
    ...: 
    ...: 
419 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [94]: %%timeit
    ...: c = df.shape[1]
    ...: end = 10**(c-1)
    ...: df.mul(np.geomspace(end, 1, num=c)).sum(1)
    ...: 
    ...: 
8.02 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

edited Dec 19 '20 at 04:27

answered Dec 19 '20 at 04:10

Ch3steR

20,090
4
28
58

Using arithmetic here is very elegant, good work! – cs95 Dec 19 '20 at 05:58
This is a very nicely thought out trick! – anky Dec 19 '20 at 06:15

combine all columns in Dataframe (pandas) - python3

2 Answers2