0

I started out with the dataframe below

    0
0   Alabama[edit]
1   Auburn (Auburn University)[1]
2   Florence (University of North Alabama)
3   Jacksonville (Jacksonville State University)[2]
4   Livingston (University of West Alabama)[2]

Then I cleaned it up to this

         State        RegionName
0        Alabama    
1        Auburn 
2        Florence   
3        Jacksonville   
4        Livingston 

I am not sure how to move Auburn, Florence, Jacksonville and Livingston to RegionName as they are regions in Alabama. Also I need to apply the moving of regions(500+) to their respective states(50 states) for the rest of my data.

Below is how the data is mapped(I added in the type of each row on the side)

State-->Alaska  
Region->Fairbanks   
State-->Arizona 
Region->Flagstaff   
Region->Tempe   
Region->Tucson

Expected answer:

    State     RegionName
0  Alabama     Auburn
1  Alabama     Florence
2  Alabama     Jacksonville
3  Alabama     Livingston
yudhiesh
  • 6,383
  • 3
  • 16
  • 49

1 Answers1

0

This is what I would do, starting with original data:

df['State'] = df[0].str.extract('(.*)\[edit\]').ffill()

df['RegionName'] = df[0].str.extract('(.*) \(')

df = df.dropna(subset=['RegionName'])

Output:

                                                 0    State    RegionName
1                    Auburn (Auburn University)[1]  Alabama        Auburn
2           Florence (University of North Alabama)  Alabama      Florence
3  Jacksonville (Jacksonville State University)[2]  Alabama  Jacksonville
4       Livingston (University of West Alabama)[2]  Alabama    Livingston
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74