This is fairly straightforward, but I can't quite figure out how to make this code work. Probably better knowledge of regex would help me out.
I have a list of URLs, many of which are from domains that belong to countries outside the US. I would like to filter out the ones that fit a list of specific country codes. My list is based off a table found here: https://www.countries-ofthe-world.com/TLD-list.html
Taking my original list of URLs, I separated them out so that one column will be just the top level domain ending (.com, .net, etc..)
So then I want R to go through my list and detect all of the country URLs that I took from that list and filter those out. However, it doesn't seem to work the way I had hoped.
filtered_list <- df %>% filter(!str_detect(domain_ending, country$endings))
The idea is that it will take all the domain endings and keep the ones that don't match the ones from my list. I've tested a bunch of variations of this code, but I can't quite figure out why it's removing some .coms and others that aren't even in my list, and keeping .de and others that I know should be filtered.
Edit: Here's some fictional variations on example websites to help with the code
list <- c("Facebook.com", "Twitter.de", "Google.at", "Youtube.cn", "Instagram.fi", "Linkedin.com", "Wordpress.org", "Pinterest.au", "Wikipedia.org")
Supposing I wanted to take that list and filter out all the endings that show up on the list from that first table linked above, how would I go about this? There's something wrong with my code somewhere, so maybe this example can help. My variables are classified as characters. That might make a difference?
Edit2: Wrote a CSV file and re-uploaded it into R and now it works. Sorry to waste everyone's time. Thanks for everyone's help though.