I have real estate properties and their details (17 columns) in a CSV file (nearly half a million entries). One of the columns provides a location but is actually somewhat a bit too detailed. I want to categorize my entries so I want to simplify the location to give me more generic areas. I would have the areas I want to categorize the entries into in a list such as:
keywords = ['Downtown','Park View','Industrial District', ... ]
So ideally I would like to take an entry that has for example Sky Tower Downtown Los Angeles
and then classify it as Downtown
.
So the task is to first detect the keyword in the location
column and then append it to a new column (right beside it if possible). If no keyword is found in the entry, I would to classify it as Other
.
It would look something like this:
Date | Record_Type | Location | Proterty_Type | ... | Price |
---|---|---|---|---|---|
19-Mar-21 | Active Listing | Sky Tower Downtown Los Angeles | Apartment | ... | 15000 |
19-Mar-21 | Active Listing | Central Park Residential Tower, 5th Avenue | Apartment | ... | 17000 |
20-Mar-21 | Active Listing | Meadow Gardens, Park View | Villa | ... | 125000 |
To something like:
Date | Record_Type | Location | Area | Proterty_Type | ... | Price |
---|---|---|---|---|---|---|
19-Mar-21 | Active Listing | Sky Tower Downtown Los Angeles | Downtown | Apartment | ... | 15000 |
19-Mar-21 | Active Listing | Central Park Residential Tower, 5th Avenue | Other | Apartment | ... | 17000 |
20-Mar-21 | Active Listing | Meadow Gardens, Park View | Park View | Villa | ... | 125000 |
Finally it saves it all to a new csv file. I would also ideally like yo use pandas
to read/write on the csv.
Thanks in advance!
Edit: I have tried methods such as the following threads, but I get errors and I don't know whats wrong, so Im open to fresh ideas.