how to combine two columns with an if/else in python pandas?

Question

I am very new to Pandas (i.e., less than 2 days). However, I can't seem to figure out the right syntax for combining two columns with an if/else condition.

Actually, I did figure out one way to do it using 'zip'. This is what I want to accomplish, but it seems there might be a more efficient way to do this in pandas.

For completeness sake, I include some pre-processing I do to make things clear:

records_data = pd.read_csv(open('records.csv'))

## pull out a year from column using a regex
source_years = records_data['source'].map(extract_year_from_source) 

## this is what I want to do more efficiently (if its possible)
records_data['year'] = [s if s else y for (s,y) in zip(source_years, records_data['year'])]

I wonder whether you can write `records_data['year'] = records_data['source'] if records_data['source'] else records_data['year']`. — minopret, Nov 28 '12 at 01:56
nope! --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 x=records_data['source'] if records_data['source'] else records_data['year'] ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() — pocketfullofcheese, Nov 28 '12 at 01:59
I'm looking at [Combining overlapping data sets](http://pandas.pydata.org/pandas-docs/stable/basics.html#combining-overlapping-data-sets). Maybe you can figure that out faster than I can. I was merely intrigued by your question. I have no familiarity with Pandas. — minopret, Nov 28 '12 at 02:05
I imagine your zip operation can be eliminated by the technique of [How to slice by columns in pandas](http://stackoverflow.com/questions/10665889/how-to-slice-by-columns-in-pandas). But I still think there's a better way that one could figure out from the documentation that I linked. — minopret, Nov 28 '12 at 02:16
seems that you could use `parse_dates` and an index col to get your dates parsed by `read_csv`. btw you do not need to `open` your file, when you pass it to `read_csv`. Please provide some example data! — bmu, Nov 28 '12 at 07:39

score 17 · Accepted Answer · edited Nov 17 '15 at 17:43

17

In pandas >= 0.10.0 try

df['year'] = df['year'].where(source_years!=0,df['year'])

and see:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking

As noted in the comments, this DOES use np.where under the hood - the difference is that pandas aligns the series with the output (so for example you can only do a partial update)

edited Nov 17 '15 at 17:43

denfromufa

5,610
13
81
138

answered Nov 30 '12 at 00:01

Jeff

125,376
21
220
187

score 10 · Answer 2 · answered Nov 28 '12 at 03:13

10

Perhaps try np.where:

import numpy as np
df['year'] = np.where(source_years,source_years,df['year'])

answered Nov 28 '12 at 03:13

unutbu

842,883
184
1,785
1,677

This is great! Unlike pandas' `.where` this allows me to specify only a scalar and it'll do the dirty work of filling in the series itself. A contrived example: `df['c'] = np.where(df.a > df.b, 1, -1)` – fantabolous Aug 18 '14 at 03:34

how to combine two columns with an if/else in python pandas?

2 Answers2