0

I am working on Developers 2020 survey, and I want to somehow summerise the "DevType" column that looks like this

Developer, full-stack                                                                                           4424
Developer, back-end                                                                                             3086
Developer, back-end;Developer, front-end;Developer, full-stack                                                  2227
Developer, back-end;Developer, full-stack                                                                       1476
Developer, front-end                                                                                            1401
Developer, mobile                                                                                               1251
Developer, front-end;Developer, full-stack                                                                       830
Developer, back-end;Developer, desktop or enterprise applications;Developer, front-end;Developer, full-stack     813
Developer, back-end;Developer, desktop or enterprise applications                                                650
Developer, desktop or enterprise applications                                                                    606
Name: DevType, dtype: int64.

I want to do analysis on it first so I'm wanting to summerise these titles to shorter/compact names so they will be presentable on a graph. Then, I am hoping to somehow assign numbers to this column( I thought I will do dfuk["#DevType"]=dfuk["DevType"].apply(lambda x: len(str(x).split(';'))) ) but its not that great of a solution.

Please help me find solutions to these problems and thank you in advance!

1 Answers1

0

I would suggest to:

  • remove entirely the word "developer", which is repeated needlessly everywhere, since there is no doubt about the domain you are dealing with here;
  • take the first letter of each word ('fs' for 'full-stack') if the remaining names are still too long;
  • use pd.Categorical to assign a number to each type of dev.

See this post for the last point: Pandas: convert categories to numbers

Laurent
  • 12,287
  • 7
  • 21
  • 37
  • If this answer has solved your question please consider accepting it by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this. If you wish, you can also add +10 points to any author of any good answer by click upper gray triangle. In any case, have a nice day. – Laurent Mar 22 '21 at 14:21