0

I have a dataframe:

         State                           RegionName
0      Alabama                              Alabama
1          NaN                               Auburn
2          NaN                             Florence
3          NaN                         Jacksonville
4          NaN                           Livingston
5          NaN                           Montevallo
6          NaN                                 Troy
7          NaN                           Tuscaloosa
8          NaN                             Tuskegee
9       Alaska                               Alaska
10         NaN                            Fairbanks
11     Arizona                              Arizona
12         NaN                            Flagstaff
13         NaN                                Tempe
14         NaN                               Tucson

How can I return

DataFrame([["Alabama", "Auburn"], 
           ["Alabama", "Florence"], .
            .., 
           ["Alaska", "Fairbanks"],  
           ["Arizona", "Flagstaff"],  
           ...], columns=["State", "RegionName"])

so all values are mergend nicely?

I had tried: df['State'] = df['State'].apply(lambda x: df['RegionName']) but it lacks the logic of assigning new State for a RegionName when the new one starts.

feedthemachine
  • 592
  • 2
  • 11
  • 29

2 Answers2

1

Need ffill:

df['State'] = df['State'].ffill()
print (df)
      State    RegionName
0   Alabama       Alabama
1   Alabama        Auburn
2   Alabama      Florence
3   Alabama  Jacksonville
4   Alabama    Livingston
5   Alabama    Montevallo
6   Alabama          Troy
7   Alabama    Tuscaloosa
8   Alabama      Tuskegee
9    Alaska        Alaska
10   Alaska     Fairbanks
11  Arizona       Arizona
12  Arizona     Flagstaff
13  Arizona         Tempe
14  Arizona        Tucson
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 2
    It seems very unlikely to me this is the first question which has `.ffill()` as an answer. – DSM May 17 '17 at 11:42
  • @DSM - I think it is super ;) – jezrael May 17 '17 at 11:44
  • I guess what I was hinting at is that a question which has such a simple answer has probably been asked before, and instead of answering it again we should find the dupe. When I get a second I can find a good target myself if no one else has yet. – DSM May 17 '17 at 11:45
  • @DSM - I agree, you are absolutely right. – jezrael May 17 '17 at 11:46
  • @jezrael how can I delete the duplicate rows, e.g `Alabama Alabama`, `Alaska Alaska`? I guess something similar to `df[['State'] != ['RegionName']]` has to be applied – feedthemachine May 17 '17 at 11:58
  • `lst = list()` `for i in df.iterrows():` `if i[1][0] != i[1][1]:` `lst.append(i[1])` `newdf = pd.DataFrame(lst)` – feedthemachine May 17 '17 at 12:17
  • Sorry, I dont get comment. – jezrael May 17 '17 at 12:17
  • Give me a some time. – jezrael May 17 '17 at 12:18
  • 1
    You are really close `df1 = df[df['State'] != df['RegionName']]` – jezrael May 17 '17 at 12:18
  • Thanks. The problem with this solution is the index of df1 has some gaps in it for items have been removed: `print(df.index)` returns `Int64Index([ 1, 2, 3, 4, 5, 6, 7, 8, 10, 12,` As you can see 0 and 9 are gone (0 stood for Alabama, 9 was Alaska). I have tried to reset the index of df1 but there was no luck – feedthemachine May 17 '17 at 12:26
  • 1
    Then use parameter `drop=True` - `df1 = df[df['State'] != df['RegionName']].reset_index(drop=True)` – jezrael May 17 '17 at 12:27
0

You can try fillna.

df=pd.DataFrame([["Alabama", "Auburn"],
               [np.nan, "Florence"],
               [np.nan, "Fairbanks"],
               ["Arizona", "Flagstaff"]], columns=["State", "RegionName"])
df
Out[94]: 
     State RegionName
0  Alabama     Auburn
1      NaN   Florence
2      NaN  Fairbanks
3  Arizona  Flagstaff


df.fillna(method='ffill')
Out[95]: 
     State RegionName
0  Alabama     Auburn
1  Alabama   Florence
2  Alabama  Fairbanks
BENY
  • 317,841
  • 20
  • 164
  • 234