I have a big data frame main_df
with company_names
and several variables
. Some of the company_names
are misspelled, have typos, or need to be changed otherwise. Therefore, I am creating a vector
of unique names, using:
unique_names <- unique(levels(as.factor(main_df$company_name)))
This gives me a vector
that looks something like this when seen from the view window view(unique_names):
V1:
Cosmonize Bulgaria Inc.
Crown One Foundation
Institut f�r Luft-und Raumfahrttechnik
Suppose, for instance, that Crown One Foundation
changed its name to Crown Two Foundation
. In this case, I would hard code the change in main_df
for all instances:
main_df$company_name[which(main_df$company_name == "Crown One Foundation")] <- "Crown Two Foundation"
This approach has worked well for all entries except the ones that show a replacement character
, like Institut f�r Luft-und Raumfahrttechnik.
I've tried copying the entry from the view window:
main_df$company_name[which(main_df$company_name == "Institut f�r Luft-und Raumfahrttechnik")] <- "Institut fur Luft-und Raumfahrttechnik"
I've also tried to slice out the appropriate cell and used the result: unique_names[100]
:
main_df$company_name[which(main_df$company_name == "Institut f\xfcr Luft-und Raumfahrttechnik")] <- "Institut fur Luft-und Raumfahrttechnik"
Neither approach worked. When I refresh unique_names <- unique(levels(as.factor(main_df$company_name)))
nothing changes. Interestingly, when I search for Institute
in the search window of the view window
, the one in question does not appear.
Another idea I had was to work with Encoded
. I used Encoding(unique_names[100]
to find that it is UTF-8
. Using Encoding(unique_names[100] <- 'latin1'
changed the entry in the view window to Institut für Luft-und Raumfahrttechnik
.
However, upon refreshing the unique entries using unique_names <- unique(levels(as.factor(main_df$company_name)))
, the entry is not updated.
Even then,
main_df$company_name[which(main_df$company_name == "Institut für Luft-und Raumfahrttechnik")] <- "Institut fur Luft-und Raumfahrttechnik"
doesn't lead to a change either (removing the umlaut here).
Am I looking at this the wrong way? I know there is a lot of hard coding and I've changed all entries besides the ones with the replacement character
. Therefore, I don't want to change the Encoded
properties for the entire vector but rather change these few dozen entries manually.
Thanks a lot in advance. I don't have a package preference and would appreciate any help.
Edit: Upon request, here is the part of the output for dput(unique_names)
:
c("Aalborg University", "Aalto University", "Aarhus University", "ACDVE", "Aero LLC", "AgilitySpaceCorp", "Air Force Research Laboratory (AFRL), "Airbus")
Here is dput(head(main_df$company_name))
:
c("Aalborg University", "Aalborg University", "Aalborg University", "Aalborg University", "Aalborg University", "Aalborg University")