Basically, I am trying to do some data cleaning. I am working in a data set of a bike sharing company that list every trip made on its system. There are columns for station name and station id for every ride, but in many rows the station_name is badly written.
I would like that every row in station_name had the same value for every id in station_id, but right now 5 different names are common for every id, and there are hundreds of ids.
I managed to create a new dataset that looks something like this:
station_id | station_name |
---|---|
1032 | Public Rack - Kedvale Ave & 63rd St |
1033 | Public Rack - Pulaski Rd & 65th St |
1038 | Public Rack - Kedzie Ave & 62nd Pl |
1039 | Public Rack - Kedzie Ave & 61st Pl |
...
It list all the id's and station names once. I would like to do so that r checks the main dataset, checks the id there, then checks the id on this new dataset and reeplaces whatever it is in the main station_name with the station_name of this new dataset.
Any help is appreciated, I honestly have no idea where to start. If anyone knows how to do this using the tidyverse's packages that would be amazing. It's what I am currently using. Also, if there's a better aproach than what I did already please let me know.