I have two dataframes ['stations_live'] and ['stations_master']. They have columns containing the same information type (station name, latitude, longitude etc..) but named differently. I am trying to rename the column headers of station_live to match those in stations_master:
stations_live_df.rename(columns={'full_station_name':'Station',
'lat':'Latitude',
'lon':'Longitude',
'is_charging_station':'E-Station?'},
inplace = True)
However, this seems to result in the actual values in the "full_station_name" column being altered ("1st & Main" instead of "0001 1st & Main"). Here, the #### station code is important to generate a match with the other dataframe.
This (station_id + station_name) format was achieved by combining two columns within the dataset
stations_live_df['station_id'] = stations_live_df['station_id'].astype(str)
stations_live_df['full_station_name'] = stations_live_df['station_id'] + " " + stations_live_df['name']
Imedaitiely before running the df.rename()
command, the 1st item in the "full_station_names" column is "0001 1st & Main" as it should be. Immediately after that command, it reverts to "1st & Main" and therefore I get 0 matched to the other dataframe.
Please let me know what is going on here if you know. I am VERY confused.
#------------------------------------------------------------------------
Attempt to reproduce outside of script here DOES NOT reproduce the issue but is the method I use in the real script but with made up dfs.
#dfs
stations_live_df = pd.DataFrame({"station_id": ["0001", "0002", "0003"], "name": ["1st & Main", "2nd & Main", "3rd & Main"]})
stations_master = pd.DataFrame({"Station": ["0001 1st & Main", "0002 2nd & Main", "0003 3rd & Main"]})
#merge id and name in live df
stations_live_df['full_station_name'] = stations_live_df['station_id'] + " " + stations_live_df['name']
#rename stations_live_df columns to reflect those in stations master
stations_live_df.rename(columns = {'full_station_name':'Station'}, inplace = True)
# Get the values in the specified column from both DataFrames
live_stations = stations_live_df['Station'].tolist()
master_stations = stations_master['Station'].tolist()
# Find the values in df1 that do not have a match in df2
new_stations = [value for value in live_stations if value not in master_stations]
print(len(new_stations))