-1

I would like to programatically read in a moderate number of excel files, compile them to a single database, and print them to a new file. I generally use list.files() and a for loop to achieve this, but in this case there are a few files that I don't want to include in the data read. I don't want to move them from the directory as it is not my project to organize the data for.

Here's an example of much shorter filenames than the real ones. The filenames I'd like to remove include the strings "no" and "negative". I think the code speaks for itself.

library(tidyverse)

vec1 <- c('C:/yes/file.xlsx','C:/yes/file.xlsx','C:/yes/file.xlsx',
          'C:/no/file.xlsx','C:/no/file.xlsx','C:/yes/file.xlsx',
          'C:/yes/file.xlsx','C:/yes/file.xlsx','C:/negative/file.xlsx')

vec1


rm.files <- vec1 %>% str_locate(c('no')) %>% data.frame() %>% 
  coalesce(vec1 %>% str_locate('negative') %>% data.frame()) %>% 
  na.omit() %>% 
  rownames() 
rm.files

With this index of filenames to remove from vec1, I'd like to remove those filenames from the vector prior to the data read. In my mind I should be able to do something like this:

vec1[-c(rm.files)]
M--
  • 25,431
  • 8
  • 61
  • 93
dandrews
  • 967
  • 5
  • 18
  • ```vec1[!grepl("no|negative", vec1)]```? – M-- Aug 09 '23 at 20:04
  • If you really want to stick to your solution, then this: ```vec1[-(stringr::str_locate(vec1, 'no|negative') %>% as.data.frame() %>% na.exclude() %>% rownames() %>% as.numeric())]``` – M-- Aug 09 '23 at 20:19
  • @M-- True, that works with one less step and coding is all about efficiency. I simply don't use `grepl` and need to get better with it as that's the simplest solution. I'm not sure why I'm getting downvotes on this as I discovered my solution as I worked on this question (as happens often when I try to carefully craft a question), but I didn't want to lose my solution, so I clicked [here](https://stackoverflow.blog/2011/07/01/its-ok-to-ask-and-answer-your-own-questions/?_ga=2.28145791.237628124.1691525515-346384911.1626886164) and figured if no one else it would at least help me later on – dandrews Aug 09 '23 at 20:25
  • I thought this forum was to share knowledge, perhaps I should delete this question if it is misleading in some way – dandrews Aug 09 '23 at 20:27
  • Questions that are duplicate tend to get downvotes. Moreover, regarding your solution, you are basically missing that rownames are characters and not numbers. So, this can also be closed as a typo. I'd delete this, but then you have an upvoted answer, so you cannot do that. – M-- Aug 09 '23 at 20:52
  • 1
    On a separate note, if you want to learn more about regex in r, here's a link: https://www.datacamp.com/tutorial/regex-r-regular-expressions-guide – M-- Aug 09 '23 at 21:07
  • @M-- I agree that the regex solution is better than mine, however, I'm struggling to understand your comment "that rownames are characters and not numbers". The whole point of my question is that I'm working with a vector not a dataframe and (prior to knowing the regex solution) I was extracting the indices (numbers indicating locations in the vector) of the file paths to remove. My issue was these were being extracted as characters and I couldn't code a solution that worked (until I figured it out as I formed this question and so posted my own answer as was suggested in the link above). – dandrews Aug 10 '23 at 18:36
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/254877/discussion-between-m-and-dandrews). – M-- Aug 10 '23 at 18:37

2 Answers2

1

grepl returns a logical vector values in the vector that match pattern

vec1[!grepl("no|negative", vec1)]
Azor Ahai -him-
  • 123
  • 1
  • 7
  • OP is deciding whether to remove this thread or not, and this solution is available in the dupe-target. If you may, please consider deleting your answer to give them the opportunity to delete their post. Cheers. – M-- Aug 09 '23 at 21:02
-1

This works so long as you convert the index to a numeric prior to running the code:

rm.files <-  rm.files <- vec1 %>% str_locate(c('no')) %>% data.frame() %>% 
  coalesce(vec1 %>% str_locate('negative') %>% data.frame()) %>% 
  na.omit() %>% 
  rownames() %>% 
  as.numeric()

vec1[-c(rm.files)]
dandrews
  • 967
  • 5
  • 18