0

I have datasets formatted in a way represented by the set below:

FirstName Letter   
Alexsmith     A1
ThegreatAlex      A6
AlexBobJones1      A7
Bobsmiles222       A1
Christopher     A9
Christofer     A6

I want to change it to this:

School FirstName Letter   
Greenfield Alexsmith     A1
Greenfield ThegreatAlex      A6
Greenfield AlexBobJones1      A7
Greenfield Bobsmiles222       A1
Greenfield Christopher     A9
Greenfield Christofer     A6

I want to add a leftmost column indicated which school the dataset comes from. I am importing this data from csv into R to begin with, and the filenames already have the school name in them.

Is it possible to retrieve the school name from the file name? The filenames are formatted like this: SCHOOLNAME_1, SCHOOLNAME_2, etc. The numbers do not need to be retained

My goal here is to automate this process through a loop because of how many of these datasets I will be accumulating, which is why I am starting small with this question.

I tried something like this:

School <- c(length(schoolimport))

but don't know how to add in the values of each cell

Thank you & I am happy to clarify anything

user7264
  • 123
  • 8
  • 2
    Have a look at my answer at [How to make a list of data frames?](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames). If you read all your files into a `list`, you can assign names to the list based on the filename, `names(my_list) <- my_files`, then combine all the data frames into one, keeping the filename, `my_data <- dplyr::bind_rows(my_list, .id = "School")` and then strip off any trailing `_Number` with `my_data %>% mutate(School = sub(pattern = "_[0-9]+", replacement = "", School))`. – Gregor Thomas Jul 13 '22 at 17:15
  • Can you elaborate how to make the names of the list to the filename with `names(my_list) <- my_files` ? – user7264 Jul 14 '22 at 18:55
  • Presumably you have a vector of file names to read in at some point. You use that. – Gregor Thomas Jul 14 '22 at 19:05
  • @GregorThomas Ah yes, that is my current problem. I am trying to read in and assign names from the datafile names directly. Stuck on str_replace, I will make a new post about this. Thank u for the help – user7264 Jul 14 '22 at 19:07
  • @GregorThomas Can you elaborate on the sub(pattern = " " syntax? Where can I learn how to dictate this to pick and choose what I want to keep from the filepath? i.e. where can I learn about what the _[0-9]+ syntax means? – user7264 Jul 14 '22 at 19:25
  • 1
    It's called "regular expressions" or "regex". Look up "regex tutorial" or "introduction to regex". I like using regex101.com to debug regex patterns. Though I think a lot of people just ask on here when they need help writing regex ;) – Gregor Thomas Jul 14 '22 at 19:29
  • Haha I am on question cooldown so I will ask when i can if I cant figure it out myself :) Thanks – user7264 Jul 14 '22 at 19:48

1 Answers1

1

Assuming you want them all in the same data frame, my suggestion would be to use the functions purrr::map_dfr and fs::dir_ls. The files will need to be in the same format for this to work.

Put the files in their own folder, then do

list_of_files <- dir_ls(folder_name)

list_of_files |>
    map_dfr(read_csv, .id = 'school_name')

This will return an appended data frame with the file names added as a column called 'school_name'. You could then use regular expressions to extract the school name from the file name.

Evan FNG
  • 46
  • 4
  • 1
    Thanks for the reply @Evan-FNG, my csv use the deliminator of "^", how do I tell map_dfr this? Thanks – user7264 Jul 14 '22 at 18:13
  • 2
    @samism Replace `read_csv` in this answer with whatever command you have already used to read in the files. For example if you had used `read.table("file_name.csv", sep = "^")` you can use `map_dfr(read.table, sep = "^")`. – Gregor Thomas Jul 14 '22 at 19:07
  • To add to what @GregorThomas said, when you use additional arguments for a function that you're applying with `map_*`, you'll need to use lambda notation: `list_of_files |> map_dfr(\(x) read.table(x, sep = '^'), .id = 'school_name')` – Evan FNG Jul 14 '22 at 20:05
  • 1
    @EvanFNG no need for lambdas here. All the `purrr::map_*` functions have a `...` argument that will be passed on to the `.f` function. `map_dfr(read.table, sep = "^", .id = "school_name")` will work just fine. – Gregor Thomas Jul 14 '22 at 20:08
  • @GregorThomas Oh cool! Never realized that they had already thought of that. – Evan FNG Jul 14 '22 at 20:18