Panda: Create a column using first 2 letters from a text column

Question

How to create a column using first 2 letters from other columns but not including NaN? E.g. I have 3 columns

a=pd.Series(['Eyes', 'Ear', 'Hair', 'Skin'])

b=pd.Series(['Hair', 'Liver', 'Eyes', 'NaN'])

c=pd.Series(['NaN', 'Skin', 'NaN', 'NaN'])

df=pd.concat([a, b, c], axis=1)

df.columns=['First', 'Second', 'Third']

Now I want to create a 4th column that would combine first 2 letters from 'First', 'Second' and 'Third' after sorting (so that Ear comes before Hair irrespective of the column). But it would skip NaN values.

The final output for the fourth column would would look something like:

Fourth = pd.Series(['EyHa', 'EaLiSk', 'EyHa', 'Sk'])

https://stackoverflow.com/questions/19377969/combine-two-columns-of-text-in-dataframe-in-pandas-python, you just need slice firstly and add them together — BENY, Mar 25 '18 at 06:12

jezrael · Answer 1 · 2018-03-25T06:32:26.363

2

If NaN is np.nan - missing value:

a=pd.Series(['Eyes', 'Ear', 'Hair', 'Skin'])
b=pd.Series(['Hair', 'Liver', 'Eyes', np.nan])
c=pd.Series([np.nan, 'Skin', np.nan, np.nan])
df=pd.concat([a, b, c], axis=1)
df.columns=['First', 'Second', 'Third']

df['new'] = df.apply(lambda x: ''.join(sorted([y[:2] for y in x if pd.notnull(y)])), axis=1)

Another solution:

df['new'] = [''.join([y[:2] for y in x]) for x in np.sort(df.fillna('').values, axis=1)]
#alternative
#df['new'] = [''.join(sorted([y[:2] for y in x if pd.notnull(y)])) for x in df.values]
print (df)

  First Second Third     new
0  Eyes   Hair   NaN    EyHa
1   Ear  Liver  Skin  EaLiSk
2  Hair   Eyes   NaN    EyHa
3  Skin    NaN   NaN      Sk

If NaN is string:

df['new'] = df.apply(lambda x: ''.join(sorted([y[:2] for y in x if y != 'NaN'])), axis=1)

df['new'] = [''.join(sorted([y[:2] for y in x if y != 'NaN'])) for x in df.values]

edited Mar 25 '18 at 06:32

answered Mar 25 '18 at 06:13

jezrael

822,522
95
1,334
1,252

@Aran-Fey - thank you for comment, in question there are `NaN`s as strings, but in real it is missing values, so add solutions for both situtations. – jezrael Mar 25 '18 at 06:33
Yes NaN are missing values. I tried you approach and got an error message: ("'int' object is not subscriptable", 'occurred at index 0'). Any idea why is it so? – Dsh M Mar 25 '18 at 11:36
@DshM - I think there are some numeric values, try `df['new'] = df.apply(lambda x: ''.join(sorted([str(y)[:2] for y in x if pd.notnull(y)])), axis=1)` – jezrael Mar 25 '18 at 11:38

Panda: Create a column using first 2 letters from a text column

1 Answers1