How to identify empty strings in Pandas series

Question

I have a dataframe and I want to populate 'column3' with value of column 'name' if column 'gender' is empty, else with value of column 'gender'

vals = {
    'name' : ['n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7'],
    'gender' : ['', '', '', 'f',  'f', 'c', 'c'],
    'age' : [39, 12, 27, 13, 36, 29, 10]
}
df4 = pd.DataFrame(vals)
df4['column3'] = df4['name'] if len(df4['gender']) == 0 else df4['gender']

The result is that column3 has only values taken from 'gender'. I've tried the following statements:

df4['column3'] = np.where(df4['gender'].empty, df4['name'],df4['gender'])
df4['column3'] = df4['name'] if df4['gender'].empty else df4['gender']

Same results..so I am thinking that my code is not able to identify an empty string in a Python Dataframe. What am I missing?

Use `df4['column3'] = np.where(df4.gender.eq(''), df4.name, df4.gender)` — Zero, Mar 23 '18 at 10:01
@Zero ok, it works :) Please create the answer and explain why my code isn't correct — Nik, Mar 23 '18 at 10:09
Please check my answer, and you will know the operation you did is not actually apply on each row, you should use apply to do the similar logic with axis = 1 — Menglong Li, Mar 23 '18 at 10:11
Don't use `lambda` for this. Your logic is easily vectorisable. — jpp, Mar 23 '18 at 10:17

jpp · Accepted Answer · 2018-03-23T10:44:10.817

Your numpy.where construct is perfectly fine to use.

The issue you are facing is how to test a column versus an empty string. The answer is just check equality versus ''.

This is straightforward to implement:

df4['column3'] = np.where(df4['gender'] == '', df4['name'], df4['gender'])

pd.Series.empty tests if the series has no items, i.e. no rows, not whether its elements are empty strings.

Example

import pandas as pd, numpy as np

vals = {
    'name' : ['n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7'],
    'gender' : ['', '', '', 'f',  'f', 'c', 'c'],
    'age' : [39, 12, 27, 13, 36, 29, 10]
}
df4 = pd.DataFrame(vals)

df4['column3'] = np.where(df4['gender'] == '', df4['name'], df4['gender'])

#    age gender name column3
# 0   39          n1      n1
# 1   12          n2      n2
# 2   27          n3      n3
# 3   13      f   n4       f
# 4   36      f   n5       f
# 5   29      c   n6       c
# 6   10      c   n7       c

ok. You are right. It works. Answer accepted because you have provided explanation. — Nik, Mar 23 '18 at 10:35

Little Bobby Tables · Answer 2 · 2018-03-23T10:17:17.760

1

There are many ways but I feel the following is most succinct:

idx = lambda x: x.gender==''
df4.loc[idx, 'column3'] = df4.loc[idx, 'name']
df4.column3= df.column3.fillna(df4.gender)

edited Mar 23 '18 at 10:17

answered Mar 23 '18 at 10:05

Little Bobby Tables

4,466
4
29
46

@jpp I dont think you understand what is going on here. It is vectorised. The lambda is taking a whole dataframe `x` and doing a boolean comparison on the column `gender`. `loc` then uses this as a vecotrised index. This stops me from repeatedly filtering inside the `loc`. It also means that I don't create a potentially large `idx` object by actually creating the bool index. See [here](https://stackoverflow.com/questions/37102824/why-does-not-work-pandas-df-loc-lambda) for more information. – Little Bobby Tables Mar 23 '18 at 10:30
@jpp did you read my explanation? I am not using the `lambda` inside an apply. Instead of spending your time being insulting, spend some time to read my explanation. – Little Bobby Tables Mar 23 '18 at 10:35
@jpp Yes. In `lambda: x: x.gender` the `x` is a DataFrame and therefore `x.gender` a Series. i.e. it is vectorised. It is not using the lambda function on each element of the DataFrame or Series. – Little Bobby Tables Mar 23 '18 at 10:39
@jpp I wrote this above in my first explaination: "It also means that I don't create a potentially large `idx` object by actually creating the bool index" as a Series. If the original DataFrame was large then this would be a large `idx` object. – Little Bobby Tables Mar 23 '18 at 10:46
@jpp your point was that you don't like the use of element-wise lambda functions. This is not that. – Little Bobby Tables Mar 23 '18 at 10:47
1

I don't like `lambda` functions anywhere if they don't serve a purpose :). I dispute your stated purpose. – jpp Mar 23 '18 at 10:48

score 1 · Answer 3 · answered Mar 23 '18 at 10:10

1

I prefer using pandas alone to do this instead of introducing numpy:

df4['column3'] = df4[['gender', 'name']].apply(lambda x: x[0] if x[0] else x[1], axis=1)

answered Mar 23 '18 at 10:10

Menglong Li

2,177
14
19

1

@MenglongLi I didnt. I upvoted you. I was asking why some else downvoted you without leaving a comment. This is a good answer. – Little Bobby Tables Mar 23 '18 at 10:21
@jpp you should explain your reason before downvoting. This allows users to correct mistakes or change their way of thinking. – Little Bobby Tables Mar 23 '18 at 10:23
[**Why is pandas apply lambda slower than loop here?**](https://stackoverflow.com/questions/47749018/why-is-pandas-apply-lambda-slower-than-loop-here) – jpp Mar 23 '18 at 10:34

How to identify empty strings in Pandas series

3 Answers3