1

I am trying to combine all data from all columns into one column. Like this:

A B C                X
1 2 3               123
3 2 1      -->      321 
5 6 7               567

Bearing in mind that we don't know the number of columns and rows.

I tried to solve it like this, but It doesn't work.

db.assign(sum = db.apply(''.join, axis = 1)).drop([db.index], axis = 1)

Thanks in advance

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158

2 Answers2

4

Does this work?

df.astype(str).sum(axis=1).astype(int)

Given:

df = pd.DataFrame({'A':[1,3,5],'B':[2,2,6],'C':[3,1,7]})

Results:

   A  B  C    X
0  1  2  3  123
1  3  2  1  321
2  5  6  7  567
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Cool one (already upvoted) but string concatenation is slow, this builds a new string for each concatenation. May not be efficient for large dfs – Ch3steR Dec 19 '20 at 04:33
  • @Chester I agree! There are more efficient ways. – Scott Boston Dec 19 '20 at 04:39
  • 1
    `df.astype(str).agg(''.join, axis=1)` might be faster, sum performs pairwise concatenation (since strings are immutable). – cs95 Dec 19 '20 at 05:59
3

We can use np.geomspace with df.mul then df.sum

c = df.shape[1]
end = 10**(c-1)
df['X'] = df.mul(np.geomspace(end, 1, num=c)).sum(1)

   A  B  C      X
0  1  2  3  123.0
1  3  2  1  321.0
2  5  6  7  567.0

Timeit analysis:

Benchmarking setup:

vals = np.random.randint(2, 10, (100_000, 6))
df = pd.DataFrame(vals)

In [93]: %%timeit
    ...: df.astype(str).sum(axis=1).astype(int)
    ...: 
    ...: 
419 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [94]: %%timeit
    ...: c = df.shape[1]
    ...: end = 10**(c-1)
    ...: df.mul(np.geomspace(end, 1, num=c)).sum(1)
    ...: 
    ...: 
8.02 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Ch3steR
  • 20,090
  • 4
  • 28
  • 58