0

I am trying to learn how to use R/R studio for a project. Some of the initial tasks I will be using R for are described below, and I would be very grateful for a resource that teaches me how I could perform the tasks below.

I have a column of unique identifiers in one excel document (document A), ie a, b, and c. I have another excel document for each of these identifiers, with the same name as these unique identifiers. So for each unique ID, I want to look-up the spreadsheet with a matching name, and from that spreadsheet, I want to retrieve the first and final value in a certain column, as well as the mean and maximum values in that column.

I am interested in finding a resource that will teach me to do all this and more, and don't mind investing time to learn ie I am not in a rush to do this.

After this step, I have something more complicated I want to do.

I have another document (document B) where I have a column of identifiers, but the identifiers are repeated multiple times. So again, using the first document with the list of identifiers, I want to search through document B and retrieve values from the rows where the identifier is mentioned for the first and last time in the column.

If you have a resource I can study to learn to do all this and more I would be very grateful. Thank you.

jamdiel
  • 23
  • 4
  • Explore the intro guides for tidyverse. Tidyverse is a collection of packages that have all the tools to merge/join, filter and summarise data. Everything you want to do is covered in the packages there (dplyr, tidyr and so on). – Fausto Carvalho Marques Silva Mar 06 '20 at 12:28

1 Answers1

1

R offers multiple ways to do what you want and after you understood the basics you will find it probably easy to implement a solution for the tasks you described

Besides learning the R basics I'd also suggest looking at the tidyverse collection of packages. Its package dplyr offers a easy to write and read way of structuring code and together with tidyr almost all the functions you'll ever need for your day to day data wrangling needs (including the tasks mentioned in your question).

  • An Introduction to R - CRAN An official intro to the basics of R. While you would probably use alternative solutions to many of the examples here, I think it's very useful to at least once having read the basics

  • tidyverse Here you will find links (by clicking on the icons) to the tidyverse packages. Notably ggplot2 probably the plotting package in R and the aforementioned dplyr and tidyr as well as readxl, a package to read data from excel files.

Just to give you a glimpse into the future: The workflow to solve the tasks from the question could look something like the following:

  1. Read data from the excel file with the unique identifiers using readxl::read_excel
  2. Loop through the identifiers and load the corresponding files
  3. Use dplyr::mutate to find the mean, max, dplyr::first, and dplyr::last
  4. Proceed similarly for document B, maybe using dplyr::group_by and dplyr::first, and dplyr::last
dario
  • 6,415
  • 2
  • 12
  • 26
  • Tyvm. I read R for data science and Hands-on programming with R, which covered what you mentioned. So I've managed step 1 by creating an excel with only the unique IDs, and using the read_excel function to create a vector of the unique IDs? However, I am having trouble looping through these ID's to load corresponding files. I set my working directory to the relevant folder, and tried using "for (i in 1:nrow(pi)){ read_excel() }" where pi contains the IDs, but I dont know what what to put in the read_excel function this time? ie it should be the path but change every time? – jamdiel Mar 15 '20 at 22:19
  • We need to learn to walk before we can jump ;) I highly suggest really going through the resources i suggested in my answer (or others). The question you are asking now is a *really, really* basic one, so forgive me having doubts about how much of the basics you really tried to learn....) But anyways, there are many, many answers regarding that question on SO, for example: [1](https://stackoverflow.com/questions/32888757/how-can-i-read-multiple-excel-files-into-r), [2](https://stackoverflow.com/questions/54595433/merging-multiple-xls-files-in-r?noredirect=1&lq=1) and many more... – dario Mar 16 '20 at 07:04