0

I have two dataframes ['stations_live'] and ['stations_master']. They have columns containing the same information type (station name, latitude, longitude etc..) but named differently. I am trying to rename the column headers of station_live to match those in stations_master:

stations_live_df.rename(columns={'full_station_name':'Station',
                                 'lat':'Latitude',
                                 'lon':'Longitude',
                                 'is_charging_station':'E-Station?'},
                        inplace = True)

However, this seems to result in the actual values in the "full_station_name" column being altered ("1st & Main" instead of "0001 1st & Main"). Here, the #### station code is important to generate a match with the other dataframe.

This (station_id + station_name) format was achieved by combining two columns within the dataset

stations_live_df['station_id'] = stations_live_df['station_id'].astype(str)
stations_live_df['full_station_name'] = stations_live_df['station_id'] + " " + stations_live_df['name']

Imedaitiely before running the df.rename() command, the 1st item in the "full_station_names" column is "0001 1st & Main" as it should be. Immediately after that command, it reverts to "1st & Main" and therefore I get 0 matched to the other dataframe.

Please let me know what is going on here if you know. I am VERY confused.

#------------------------------------------------------------------------

Attempt to reproduce outside of script here DOES NOT reproduce the issue but is the method I use in the real script but with made up dfs.

    #dfs
stations_live_df = pd.DataFrame({"station_id": ["0001", "0002", "0003"], "name": ["1st & Main", "2nd & Main", "3rd & Main"]})
stations_master = pd.DataFrame({"Station": ["0001 1st & Main", "0002 2nd & Main", "0003 3rd & Main"]})

#merge id and name in live df
stations_live_df['full_station_name'] = stations_live_df['station_id'] + " " + stations_live_df['name']

#rename stations_live_df columns to reflect those in stations master
stations_live_df.rename(columns = {'full_station_name':'Station'}, inplace = True)

# Get the values in the specified column from both DataFrames
live_stations = stations_live_df['Station'].tolist()
master_stations = stations_master['Station'].tolist()

# Find the values in df1 that do not have a match in df2
new_stations = [value for value in live_stations if value not in master_stations]

print(len(new_stations))
Max Duso
  • 305
  • 1
  • 4
  • 15
  • 1
    I can't reproduce this - are you working in a notebook? which version of Pandas are you using `pd.__version__`? – ti7 Jul 21 '23 at 21:25
  • Please make a [mre]. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Jul 21 '23 at 21:59
  • I assume this is not actually how the dataframe is created. Please note that if you import a CSV file with "0001", "0002", "0003", etc., pandas will automatically decide those are integers, not strings, and of course there are no leading 0s in integers. – Tim Roberts Jul 21 '23 at 23:08
  • @TimRoberts . If I `print(stations_live_df['full_station_name'])` (this is before the rename of the header), then I get the correct format "0001 1st & Main". So this must mean (as I also just confirmed) that the station id was already stored as a string. Although you are right that the dataframe was actually pulled from a json file and converted using `pd.DataFrame` – Max Duso Jul 21 '23 at 23:19
  • 1
    I had a similar experience and wound up passing in the column names I wanted rather than use 'rename'. Try something like: stations_live_df.columns = ['station_id', 'name', 'Station']. With a lot of columns, it's klunky, but it did finally allow me to merge my dataframes correctly – Jonathan Leon Jul 22 '23 at 20:42
  • @JonathanLeon Yeah that's a good option. Just worried that future iterations of the dataset will not have the same column indexes and it will throw off the script. – Max Duso Aug 11 '23 at 16:43

0 Answers0