How to map values from a one column to another column of dataframe?

Question

I have two columns company_size and company_type. company_size contains NaN values which I want to fill based on company_type column. How can I map to particular value so that I can fill the NaN values i.e missing values in company_size based on company_type.

Example data:

 company_size    company_type
   10-50           startup
   1000+            PvtLtd
   NaN           Public Sector
   NaN              PvtLtd
   100-500       Public Sector
   NaN             startup
   100-500       Public Sector

Now based company_type I want to fill company_size.

startup -> 10-50
Public Sector -> 100-500
PvtLtd -> 1000+

Note both company_type and company_size are categorical variables.

I think there's a better dupe that actually does it from a dataframe not a dictionary — user3483203, Jul 19 '18 at 18:22
@xyzjayne Here I want to fill NaN values how can I do that in pandas ? — stone rock, Jul 19 '18 at 18:22
@stonerock you can select the rows where company_size is NaN — xyzjayne, Jul 19 '18 at 18:23
Perhaps this one? [remap values in pandas column with a dict](https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict) — ALollz, Jul 19 '18 at 18:23
Have you tried anything? It is quite similar to your last question, which was answered. :)) — harvpan, Jul 19 '18 at 18:24
I don't think it's mapping at all. Why not just groupby `company_type` and ffill or bfill — user3483203, Jul 19 '18 at 18:25
Instead of posting comments can someone please post answer which will be more helpful :) — stone rock, Jul 19 '18 at 18:26
Instead of asking questions that are already answered on the forum you could have done some research first and saved all of us some time :) — xyzjayne, Jul 19 '18 at 18:28
I think people are reluctant to post an answer because this question is almost certainly answered elsewhere on this site, and an almost identical question (asked by you) was answered earlier today, with a method that could easily be expanded to answer this question. — ALollz, Jul 19 '18 at 18:29

sacuL · Accepted Answer · 2018-07-19T18:29:06.823

3

You can first sort by company size to make sure that NaNs are at the end, then use groupby and ffill, then sort by index again to re-instate your original order:

df.sort_values('company_size').groupby('company_type').ffill().sort_index()

  company_size   company_type
0        10-50        startup
1        1000+         PvtLtd
2      100-500  Public Sector
3        1000+         PvtLtd
4      100-500  Public Sector
5        10-50        startup
6      100-500  Public Sector

edited Jul 19 '18 at 18:29

answered Jul 19 '18 at 18:27

sacuL

49,704
8
81
106

I don't want to change the order of my data – stone rock Jul 19 '18 at 18:28
Then just sort by index at then end (see my edited answer) – sacuL Jul 19 '18 at 18:29

user3483203 · Answer 2 · 2018-07-19T18:51:59.760

3

groupby your company time, and transform('first') to get rid of NaN values:

df['company_size'] = df.groupby("company_type")["company_size"].transform("first")

  company_size  company_type
0        10-50       startup
1        1000+        PvtLtd
2      100-500  PublicSector
3        1000+        PvtLtd
4      100-500  PublicSector
5        10-50       startup
6      100-500  PublicSector

edited Jul 19 '18 at 18:51

answered Jul 19 '18 at 18:28

user3483203

50,081
9
65
94

How to map values from a one column to another column of dataframe?

2 Answers2