Add a column of length matching that of another dataframe AND adjust the value of that column in each row depending on a filename

Question

I have datasets formatted in a way represented by the set below:

FirstName Letter   
Alexsmith     A1
ThegreatAlex      A6
AlexBobJones1      A7
Bobsmiles222       A1
Christopher     A9
Christofer     A6

I want to change it to this:

School FirstName Letter   
Greenfield Alexsmith     A1
Greenfield ThegreatAlex      A6
Greenfield AlexBobJones1      A7
Greenfield Bobsmiles222       A1
Greenfield Christopher     A9
Greenfield Christofer     A6

I want to add a leftmost column indicated which school the dataset comes from. I am importing this data from csv into R to begin with, and the filenames already have the school name in them.

Is it possible to retrieve the school name from the file name? The filenames are formatted like this: SCHOOLNAME_1, SCHOOLNAME_2, etc. The numbers do not need to be retained

My goal here is to automate this process through a loop because of how many of these datasets I will be accumulating, which is why I am starting small with this question.

I tried something like this:

School <- c(length(schoolimport))

but don't know how to add in the values of each cell

Thank you & I am happy to clarify anything

Have a look at my answer at [How to make a list of data frames?](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames). If you read all your files into a `list`, you can assign names to the list based on the filename, `names(my_list) <- my_files`, then combine all the data frames into one, keeping the filename, `my_data <- dplyr::bind_rows(my_list, .id = "School")` and then strip off any trailing `_Number` with `my_data %>% mutate(School = sub(pattern = "_[0-9]+", replacement = "", School))`. — Gregor Thomas, Jul 13 '22 at 17:15
Can you elaborate how to make the names of the list to the filename with `names(my_list) <- my_files` ? — user7264, Jul 14 '22 at 18:55
Presumably you have a vector of file names to read in at some point. You use that. — Gregor Thomas, Jul 14 '22 at 19:05
@GregorThomas Ah yes, that is my current problem. I am trying to read in and assign names from the datafile names directly. Stuck on str_replace, I will make a new post about this. Thank u for the help — user7264, Jul 14 '22 at 19:07
@GregorThomas Can you elaborate on the sub(pattern = " " syntax? Where can I learn how to dictate this to pick and choose what I want to keep from the filepath? i.e. where can I learn about what the _[0-9]+ syntax means? — user7264, Jul 14 '22 at 19:25
It's called "regular expressions" or "regex". Look up "regex tutorial" or "introduction to regex". I like using regex101.com to debug regex patterns. Though I think a lot of people just ask on here when they need help writing regex ;) — Gregor Thomas, Jul 14 '22 at 19:29
Haha I am on question cooldown so I will ask when i can if I cant figure it out myself :) Thanks — user7264, Jul 14 '22 at 19:48

score 1 · Accepted Answer · answered Jul 13 '22 at 17:41

1

Assuming you want them all in the same data frame, my suggestion would be to use the functions purrr::map_dfr and fs::dir_ls. The files will need to be in the same format for this to work.

Put the files in their own folder, then do

list_of_files <- dir_ls(folder_name)

list_of_files |>
    map_dfr(read_csv, .id = 'school_name')

This will return an appended data frame with the file names added as a column called 'school_name'. You could then use regular expressions to extract the school name from the file name.

answered Jul 13 '22 at 17:41

Evan FNG

46
4

1

Thanks for the reply @Evan-FNG, my csv use the deliminator of "^", how do I tell map_dfr this? Thanks – user7264 Jul 14 '22 at 18:13
2

@samism Replace `read_csv` in this answer with whatever command you have already used to read in the files. For example if you had used `read.table("file_name.csv", sep = "^")` you can use `map_dfr(read.table, sep = "^")`. – Gregor Thomas Jul 14 '22 at 19:07
To add to what @GregorThomas said, when you use additional arguments for a function that you're applying with `map_*`, you'll need to use lambda notation: `list_of_files |> map_dfr(\(x) read.table(x, sep = '^'), .id = 'school_name')` – Evan FNG Jul 14 '22 at 20:05
1

@EvanFNG no need for lambdas here. All the `purrr::map_*` functions have a `...` argument that will be passed on to the `.f` function. `map_dfr(read.table, sep = "^", .id = "school_name")` will work just fine. – Gregor Thomas Jul 14 '22 at 20:08
@GregorThomas Oh cool! Never realized that they had already thought of that. – Evan FNG Jul 14 '22 at 20:18

Add a column of length matching that of another dataframe AND adjust the value of that column in each row depending on a filename

1 Answers1