-1

In my csv file I have column 'category' where I need to set a vertical for each category and save value in the new additional column. I know how to read csv and save data frame into a new file including the new column creation in Pandas. However I need some help regarding the logic for my scenario.

my.csv:

id            category
1    auto,auto.car_dealers
2    hotelstravel,hotelstravel.hotels
3    shopping,shopping.homeandgarden,shopping.homeandgarden.appliances
4    financialservices,financialservices.insurance
5    
6    realestate
7    pets,pets.petservices,pets.petservices.petinsurance
8    homeservices,homeservices.windowsinstallation
9    professional

Rules that I need to apply: 1. If category column has no value then set Vertical column = Other 2. If category column has value then check if value is a single word and then set vertical depending on the value. If auto then set to Automotive, if hotelstravel then set to Travel etc. 3. If value has more than one word then take the word before the first comma and set the vertical value based on category. If auto then set to Automotive, if hotelstravel then set to Travel etc.

Expected output.csv:

id            category                                                       vertical 
1    auto,auto.car_dealers                                                   Automotive
2    hotelstravel,hotelstravel.hotels                                        Travel
3    shopping,shopping.homeandgarden,shopping.homeandgarden.appliances       Retail
4    financialservices,financialservices.insurance                           Financial
5                                                                            Other
6    realestate                                                              Real Estate
7    pets,pets.petservices,pets.petservices.petinsurance                     Pet Services
8    homeservices,homeservices.windowsinstallation                           Home Services
9    professional                                                            Professional Services

my code so far:

import pandas as pd
df = pd.read_csv('path/to/my.csv')

#do something here and then something like
df.loc[df['category'] == 'auto', 'vertical'] = 'Automotive'

df.to_csv('path/to/output.csv', index=False)

Any help with this will be much appreciated. Thank you in advance!

Baobab1988
  • 685
  • 13
  • 33

1 Answers1

0

You will likely need to iterate through the category column and perform checks on the value. You can use something along the following (more info):

for index, row in df.iterrows():
    if (row['Category'].is_a_list()):
        tokens = row['Category'].split()
        row['Vertical'] = tokens[0]
    else:
         ....

And because you're wanting to change values, i.e. 'hotelstravel' to 'Travel', you might need a dictionary set up with the Category name as the key and the Vertical name as the value so that you can quickly convert it