2

I am looking for the logic to concatenate the values in many columns with related data from an .xlsx file into a single column using pandas in python. The logic to combine each different column would be different depending on what information the column contains. For example:

input:  
ID,when_carpool,smoking,vehicle,passengers  
0,weekdays,yes,truck,3  
1,weekends,no,sedan,4  
2,weekdays,no,van,6  
3,weekdays,no,van,5  
4,weekends,yes,sedan,3 

I have thousands of these rows to process, note that I want to transform the value of the 'smoking' column so it's not a simple concatenation of all columns.

output:  
ID,carpool_info  
0,weekdays+smoking+truck+3  
1,weekends+nonsmoking+sedan+4  
2,weekdays+nonsmoking+van+6  
3,weekdays+nonsmoking+van+5  
4,weekends+smoking+sedan+3  
ferrios25
  • 79
  • 2
  • 10

1 Answers1

2

Join all the columns into a new one:

  df["carpool_info"] =  df.apply(lambda x: "+".join([str(x[i]) for i in range(len(x))]),axis=1) 

and then drop the other columns you don't need (see also here: Delete column from pandas DataFrame) , or just use the series carpool_Info = df["carpool_info"]

Community
  • 1
  • 1
grasshopper
  • 3,988
  • 3
  • 23
  • 29
  • Ok I tried this and it mostly works. I just need to replace the yes/no in the smoking column with smoking/nonsmoking. Thanks. – ferrios25 Jun 10 '14 at 22:46