1

I want to repeat the same operations in multiple files that have the same format [1:1259]. Each file has a column name Image in which I want to extract a number and create another column with it.

The code I want to repeat to each of my files.

r<- regexpr("\\d+", Seg_grow_1mm.csv[,"Image"])
Seg_grow_1mm_01<- Seg_grow_1mm.csv %>%
  mutate(., new_id=(regmatches(Seg_grow_1mm.csv[,"Image"], r)))

Preview of Seg_grow_1mm_01

#   ID    Image                                                                    New_id
#   02     /Users/LLG/Data avec smoothing-margin/CHUM/05/Augmentation 3 mm/           05
#   03     /Users/LLG/Data avec smoothing-margin/CHUM/103/Augmentation 3 mm/          103
#   04     /Users/LLG/Data avec smoothing-margin/CHUM/145/Augmentation 3 mm/          145
# ....

I want to repeat this operation to each of my files. I tried with a loop without success and I don’t know how to transform it into a function so I can use lapply on my list of files.

seg = list.files(path=csv, pattern="*.csv") # Seg[1:3]

for (i in 1:length(seg))
  assign(seg[i], read.csv(seg[i]))for (x in seg)
      r<- regexpr("\\d+", x[,"Image"])
      mutate(., new_id=(regmatches( x[,"Image"], r)))

Error in x[, "Image"] : incorrect number of dimensions

I don't know what to put at the ??.

seg01<- lapply(seg, function (z)
  {r<- regexpr("\\d+", ?? [,"Image"])
  mutate(., new_id=(regmatches( ?? [,"Image"], r)))})

Thank you for the help!

  • https://stackoverflow.com/a/24376207/3358227 is a good discussion about working with lists-of-frames/tables. – r2evans Aug 09 '21 at 16:21
  • @r2evans, thank you for the link, but it's not helping me with my problem... – Tchat Cusson Aug 09 '21 at 16:41
  • Okay, sorry about that. Not sure what else I can suggest without an idea of what the data looks like though. You don't have to show all 1000+ columns to demonstrate what you need out of one. I'm not sure if this is a question about how to use `lapply`, how to deal with multiple files, or how to extract a number from a string (regex or otherwise). – r2evans Aug 09 '21 at 16:46
  • Sorry if it’s not clear … English is not my first language and I’m still new with R. I created a small preview of my file (dput(head(de) was horrible …) My question is really about how to do the regular expression to every file of my list seg. I can do it for every file individually. For now ,I have 3 files so it’s easy to just copy in past the code 3 times, even though it’s not efficient, but soon I will have 15 csv files in my list seg. – Tchat Cusson Aug 09 '21 at 17:06
  • So, you have a list of files named similar to `Seg_grow_1mm_01`. Those files share the same structure (as a data.frame) like shown in your preview. And now you want to extract the numbers in the `image` column (the `05`, `103`, `145` in your example) and put them into a new column for every data.frame? – Martin Gal Aug 09 '21 at 17:54
  • I'm so not clear sorry ... I have a list of files like Seg_grow_1mm.csv and I want to create a list of files like Seg_grow_1mm_01. The Seg_grow_1mm_01 file has the new column "new_id". – Tchat Cusson Aug 09 '21 at 18:03

1 Answers1

0

You could use a tidyverse approach:

seg <- list.files(pattern="*.csv")

library(purrr)
library(readr)
library(dplyr)
library(stringr)

seg %>% 
  map(read_csv) %>% 
  map(~ .x %>% 
        mutate(new_id = str_extract(Image, "(?<=/)\\d+(?=/)"))) %>% 
  `names<-`(.,seg)

creates a named list of data.frames/tibbles

$example1.csv
# A tibble: 3 x 3
  ID    Image                                                             new_id
  <chr> <chr>                                                             <chr> 
1 02    /Users/LLG/Data avec smoothing-margin/CHUM/05/Augmentation 3 mm/  05    
2 03    /Users/LLG/Data avec smoothing-margin/CHUM/103/Augmentation 3 mm/ 103   
3 04    /Users/LLG/Data avec smoothing-margin/CHUM/145/Augmentation 3 mm/ 145   

$example2.csv
# A tibble: 3 x 3
     ID Image                                           new_id   
  <dbl> <chr>                                           <chr>    
1    23 /example/directory/1983/Augmentation 3 mm/      1983     
2    42 /example/directory/105123/Augmentation 3 mm/    105123   
3    99 /example/directory/151252145/Augmentation 3 mm/ 151252145

based on my two example files. Using assign you could create data.frames in your Global Enviroment, but that shouldn't be necessary.

If you want to write this list back into seperate .csv-files, you could use

seg %>% 
  map(read_csv) %>% 
  map(~ .x %>% 
        mutate(new_id = str_extract(Image, "(?<=/)\\d+(?=/)"))) %>% 
  `names<-`(.,seg) %>%
  map2(.x = .,
       .y = paste("new_", seg),
       ~ write_csv(x = .x, file = .y))

This creates files with a prefix new_ and the old filename in your current working directory. If you want your filenames like "oldfilename_01.csv", just replace the paste("new_", seg) with str_replace(seg, "\\.csv", "_01\\.csv").

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Hi @Martin Gal, thanks for the answer. I don't understand how I can read each individual file ? And how do I write a csv for each element of the list? – Tchat Cusson Aug 09 '21 at 18:53
  • Your code uses the `list.files()` function. This function creates a vector of filenames. We use this list, to read the `.csv`-files into a list of data.frames (that's the `map(read_csv)`-part). Perhaps you have to set the working directory to the directory containing the `.csv`-files first. You can do so with `setwd("PATHtoYOURfiles")`. – Martin Gal Aug 09 '21 at 18:59
  • @TchatCusson I edited my answer to clearify it. Hope this helpes. – Martin Gal Aug 09 '21 at 19:10
  • where can I find more useful information on map() ? – Tchat Cusson Aug 09 '21 at 19:33
  • 1
    This is a good start: https://www.rebeccabarter.com/blog/2019-08-19_purrr/ – Martin Gal Aug 09 '21 at 19:33