0

Basically, I am trying to do some data cleaning. I am working in a data set of a bike sharing company that list every trip made on its system. There are columns for station name and station id for every ride, but in many rows the station_name is badly written.

I would like that every row in station_name had the same value for every id in station_id, but right now 5 different names are common for every id, and there are hundreds of ids.

I managed to create a new dataset that looks something like this:

station_id station_name
1032 Public Rack - Kedvale Ave & 63rd St
1033 Public Rack - Pulaski Rd & 65th St
1038 Public Rack - Kedzie Ave & 62nd Pl
1039 Public Rack - Kedzie Ave & 61st Pl

...

It list all the id's and station names once. I would like to do so that r checks the main dataset, checks the id there, then checks the id on this new dataset and reeplaces whatever it is in the main station_name with the station_name of this new dataset.

Any help is appreciated, I honestly have no idea where to start. If anyone knows how to do this using the tidyverse's packages that would be amazing. It's what I am currently using. Also, if there's a better aproach than what I did already please let me know.

zephryl
  • 14,633
  • 3
  • 11
  • 30
  • You’ll want to use a merge, e.g. with dplyr, `original_data %>% select(!station_name) %>% left_join(corrected_names)`. See this thread: [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – zephryl Feb 18 '23 at 01:56
  • Please provide enough code so others can better understand or reproduce the problem. – Community Feb 18 '23 at 06:30

0 Answers0