3

I would like to create a new column that will put together 2 columns together. I looked over the internet but found nothing. How could I do:

Ex:

A B
50.631456 5.57871    

C
(50.631456, 5.57871)
smci
  • 32,567
  • 20
  • 113
  • 146
  • So You want to put 2 columns into 1 like in C? See this: https://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas in Sereis put Your data – Lukasz Mar 04 '18 at 22:34
  • You want to combine two (numeric?) columns into one column containing a Python tuple, yes? (Not strings) – smci Mar 04 '18 at 23:26
  • By the way, if you want to do this to have (lat, long) tuple objects inside your dataframe, it's not a great idea, any function that processes them will have to give them special treatment. Better just to convert into tuple when you write csv/ export/ pickle the dataframe. – smci Mar 04 '18 at 23:56
  • 1
    @ThibaultMambour, does one of the below solutions solve your problem? if so, consider accepting an answer (green tick on left), so other users know. – jpp Mar 09 '18 at 01:49

3 Answers3

6

list + zip is one efficient way:

df['C'] = list(zip(df.A, df.B))

#            A        B                     C
# 0  50.631456  5.57871  (50.631456, 5.57871)

Performance

As expected, df.apply methods are loopy and inefficient for large dataframes, especially when combined with lambda.

df = pd.concat([df]*10000)

%timeit list(zip(df.A, df.B))                  # 3.14ms
%timeit df.apply(tuple, axis=1)                # 378ms
%timeit df.apply(lambda x: (x.A,x.B), axis=1)  # 577ms
jpp
  • 159,742
  • 34
  • 281
  • 339
4

Checkout DataFrame.apply.

df = pd.DataFrame(np.random.randint(0, 10, (6, 2)), columns=['a', 'b'])

df['c'] = df.apply(tuple, axis=1)
df

returns

   a  b       c
0  8  1  (8, 1)
1  3  3  (3, 3)
2  2  8  (2, 8)
3  6  2  (6, 2)
4  2  2  (2, 2)
5  8  5  (8, 5)
Alex
  • 18,484
  • 8
  • 60
  • 80
1

you can use apply.

df = pd.DataFrame({'A': {0: 50.631456}, 'B': {0: 5.57871}})

df
Out[162]: 
           A        B
0  50.631456  5.57871

df['C'] = df.apply(lambda x: (x.A,x.B), axis=1)

df
Out[155]: 
           A        B                     C
0  50.631456  5.57871  (50.631456, 5.57871)
Allen Qin
  • 19,507
  • 8
  • 51
  • 67