0

I'm a Python / Pandas beginner and currently work on some projects with the IPython notebook. I just ran into a little problem that I couldn't solve with my book or by googling, maybe because I'm not exactly sure what term or function to search for.

Let's say I have a DataFrame with a Row

Industry Category
Software/Industry Systems
Software/Medical Systems
Software/Payment 
Electronic Components
Database Applications
Online Communities
Medical Equipment
Mobile Phones

What I want is to create a new row that assigns the rows in "Industry Category" to a "Parent Category". In this example just "Software" and "Hardware".

Industry Category                    Parent Category
Software/Industry Systems            Software
Software/Medical Systems             Software 
Software/Payment                     Software 
Electronic Components                Hardware
Database Applications                Software
Online Communities                   Software 
Medical Equipment                    Hardware
Mobile Phones                        Hardware

Note: There are about 600 Industry Category items in my list and about 30 Categories I have to sort them into.

So it would be great if there's some option to do the job with a *.csv with two rows. On the left all "Industry Category" items and on the right the desired "Parent Category" I like to apply to the dataset.

Thanks!

Christopher
  • 2,120
  • 7
  • 31
  • 58
  • So are you saying you have 2 csvs, one like your first one and another that maps the industry Category with the parent category? – EdChum Mar 23 '15 at 14:45
  • 1
    Assuming you can get your data into a dict format then it's a dupe of http://stackoverflow.com/q/20250771/3005188 – Ffisegydd Mar 23 '15 at 14:46
  • @Ffisegydd would you expect `replace` to be faster than `map`? If the second csv was just a lookup and you set the index to be the 'Industry Category` I would expect `map` to be faster – EdChum Mar 23 '15 at 14:48
  • Well, just assume I have a DataFrame and want to create a new row that assigns a Parent Category to values in"Industry Category". But I guess the _di_ is going into the right direction. – Christopher Mar 23 '15 at 14:49
  • @EdChum I have no real grasp on the relative time differences between `map` and `replace` unfortunately. – Ffisegydd Mar 23 '15 at 14:52
  • I think I got the idea. I have to create a csv and apply the the2nd step of this [link]( http://stackoverflow.com/questions/23057219/how-to-convert-csv-to-dictionary-using-pandas). Once I have the dict, I have to map the df with .map(category_list.get) – Christopher Mar 23 '15 at 14:54
  • @Christopher I think you can either create a dict or just create a df but in the latter the index needs to be the 'Industry Category', you can use `map` only if the keys are unique which is true for dict but for a df this needs to be true for your csv data, I expect `map` to be the fastest method based on personal experience – EdChum Mar 23 '15 at 15:17
  • @Christopher what works exactly? can you post as an answer – EdChum Mar 23 '15 at 15:43

1 Answers1

1

I do this quite a lot. I would create a dictionary and use apply and lambda.

example_dict = {'Software/Industry Systems':'Software','Software/Payment':'Software'}

dataframe['Parent Category'] = dataframe['Industry Category'].apply(lambda value: example_dict[value])
kennes
  • 2,065
  • 17
  • 20