9

I have a dataframe with column name col1 and col2 of integer type entries. I want to join the entries of the col1 with col2 along with a '.'(dot) in between. I have searched and found to add two column entries :

df['col'] = df['col1'].map(str) + df['col2'].map(str)

and for add a dot :

df['col'] = df['col1'].astype(str) + '.'

but I want something like this

df['col'] = each entries of df['col1'] + '.' + each entries of df['col2']

what is the difference between .map(str) and .astype(str). and which suits in my case.

tarun sahu
  • 137
  • 1
  • 1
  • 10

1 Answers1

9

map will take every single element of the original list and apply a function or lambda expression. In this compact form, your function is str(). It has more applications than that. You could for example edit every element returning a new list. This is possible because a DataFrame cell is castable to string.

astype is a Pandas function for DataFrames (and numpy for numpy arrays) that will cast the object to the specified type and therefore here it makes little practical difference except it may be more performant since it is just 1 operation compared to multiple calls and it is natively defined in Pandas. Time-it to verify. To be noted: the astype cast, as also with map, creates a new object, not mutating the existent.

Attersson
  • 4,755
  • 1
  • 15
  • 29
  • 1
    thanks Attersson. Am I right to say if I use .map(str), I will get low performance but if I use .astype(str) I will get relatively good because latter is just one operation? – tarun sahu Jun 10 '18 at 11:28
  • 1
    Please timeit https://docs.python.org/2/library/timeit.html and verify. Performance is definitely not the difference here anyway. map is more generic and non pandas specific... astype can only be used to perform a type cast – Attersson Jun 10 '18 at 11:29