-1

I wanted to import multiple csv files from a folder and sort them into distinct data frames based on the file name.

The pattern of my file name is chX_imgN_chYROI, where X & Y = 1, 2 & 3, N = 1,2,3,4 & 5. The 'N' does not matter as I want to combine .csv files based on distinct combinations of X and Y (eg ch1_ch2ROI <– ch1_img1_ch2ROI, ch1_img2_ch2ROI..... ch1_img5_ch2ROI)

I'm a novice and any suggestions/insights will be helpful. Thanks!

nano
  • 1
  • nano, [here](https://stackoverflow.com/q/17499013/3358272)'s a link that covers not just "how to import multiple csv files", but also some data management with those resulting frames. Namely, if you are going to do similar things to each frame, it's usually better to store them as a "list of frames" and then use `lapply` or similar constructs to iterate one task over each frame within it. – r2evans Aug 06 '20 at 16:33

1 Answers1

0

The first part of this question (import multiple csv files) is really a duplicate of How do I make a list of data frames?.

But the second part -- combining some frames -- is a little different. I'll generate some sample data.

From the duplicate part, you'd probably use something like below to read in the files:

alldat <- sapply(list.files(somedir, pattern = "ch.*_img.*_ch.*.csv", full.names = TRUE),
                 read.csv, stringsAsFactors = FALSE,
                 simplify = FALSE)

Even if you just use this code blindly, I still recommend you read over the answers in How do I make a list of data frames?, as the advice and methodology are efficient and very idiomatic to R. Done correctly, they can make many workflows significantly easier to visualize, understand, and maintain.

To mimic import process, I'll use this fake data:

alldat <- list(
  "ch1_img1_ch1ROI" = mtcars[1:2,],
  "ch1_img1_ch2ROI" = mtcars[3:4,],
  "ch1_img2_ch1ROI" = mtcars[5:6,],
  "ch2_img1_ch1ROI" = mtcars[7:8,],
  "ch2_img1_ch2ROI" = mtcars[9:10,],
  "ch2_img2_ch2ROI" = mtcars[11:12,]
)
alldat
# $ch1_img1_ch1ROI
#               mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
# Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4
# $ch1_img1_ch2ROI
#                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# $ch1_img2_ch1ROI
#                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
# Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.46 20.22  1  0    3    1
# $ch2_img1_ch1ROI
#             mpg cyl  disp  hp drat   wt  qsec vs am gear carb
# Duster 360 14.3   8 360.0 245 3.21 3.57 15.84  0  0    3    4
# Merc 240D  24.4   4 146.7  62 3.69 3.19 20.00  1  0    4    2
# $ch2_img1_ch2ROI
#           mpg cyl  disp  hp drat   wt qsec vs am gear carb
# Merc 230 22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
# Merc 280 19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
# $ch2_img2_ch2ROI
#             mpg cyl  disp  hp drat   wt qsec vs am gear carb
# Merc 280C  17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
# Merc 450SE 16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3

By your logic, we have some combinations of X/Y that are unique and some that have multiple N's. Let's group solely by X/Y combinations.

  1. First, we'll extract the X and Y components into a unique string for each filename:

    gsub(".*ch([0-9]+)_.*ch([0-9]+).*", "\\1_\\2", names(alldat))
    # [1] "1_1" "1_2" "1_1" "2_1" "2_2" "2_2"
    

    Notice that we have some frames that need to be combined, namely elements 1 and 3, and elements 5 and 6.

  2. split the list of frames by this string. Notice how we have a list of 4 elements, each of which is a nested list of 1 or more frames.

    spllists <- split(alldat, gsub(".*ch([0-9]+)_.*ch([0-9]+).*", "\\1_\\2", names(alldat)))
    str(spllists, max.level = 2)
    # List of 4
    #  $ 1_1:List of 2
    #   ..$ ch1_img1_ch1ROI:'data.frame':   2 obs. of  11 variables:
    #   ..$ ch1_img2_ch1ROI:'data.frame':   2 obs. of  11 variables:
    #  $ 1_2:List of 1
    #   ..$ ch1_img1_ch2ROI:'data.frame':   2 obs. of  11 variables:
    #  $ 2_1:List of 1
    #   ..$ ch2_img1_ch1ROI:'data.frame':   2 obs. of  11 variables:
    #  $ 2_2:List of 2
    #   ..$ ch2_img1_ch2ROI:'data.frame':   2 obs. of  11 variables:
    #   ..$ ch2_img2_ch2ROI:'data.frame':   2 obs. of  11 variables:
    
  3. Iterate (lapply) over the outer list, combining the inner lists. To do the inner row-combining, we'd use

    spllists[[1]]
    # $ch1_img1_ch1ROI
    #               mpg cyl disp  hp drat    wt  qsec vs am gear carb
    # Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
    # Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4
    # $ch1_img2_ch1ROI
    #                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
    # Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2
    # Valiant           18.1   6  225 105 2.76 3.46 20.22  1  0    3    1
    do.call(rbind, spllists[[1]])
    #                                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
    # ch1_img1_ch1ROI.Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    # ch1_img1_ch1ROI.Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    # ch1_img2_ch1ROI.Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    # ch1_img2_ch1ROI.Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
    

    So to do this for all elements in the spllists, we'll use

    alldat2 <- lapply(spllists, function(x) do.call(rbind, x))
    alldat2
    # $`1_1`
    #                                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
    # ch1_img1_ch1ROI.Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    # ch1_img1_ch1ROI.Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    # ch1_img2_ch1ROI.Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    # ch1_img2_ch1ROI.Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
    # $`1_2`
    #                                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
    # ch1_img1_ch2ROI.Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    # ch1_img1_ch2ROI.Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    # $`2_1`
    #                             mpg cyl  disp  hp drat   wt  qsec vs am gear carb
    # ch2_img1_ch1ROI.Duster 360 14.3   8 360.0 245 3.21 3.57 15.84  0  0    3    4
    # ch2_img1_ch1ROI.Merc 240D  24.4   4 146.7  62 3.69 3.19 20.00  1  0    4    2
    # $`2_2`
    #                             mpg cyl  disp  hp drat   wt qsec vs am gear carb
    # ch2_img1_ch2ROI.Merc 230   22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
    # ch2_img1_ch2ROI.Merc 280   19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
    # ch2_img2_ch2ROI.Merc 280C  17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
    # ch2_img2_ch2ROI.Merc 450SE 16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3
    
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Hi, Thank you so much for your reply. I went through your link as well as your code. I couldn't execute the following: do.call(rbind, spllists[[1]]) [the error message was: Error in do.call(rbind, x) : the second argument must be a list]. When I checked is.list(spllists[[1]]), it says **FALSE**. I understand the rational behind your code, however, am unable to execute it. Please help! – nano Aug 07 '20 at 13:46
  • I made a couple of assumptions about your data. If my assumptions are not correct, then it is (and always has been) your responsibility to provide valid, representative, small sample data. Would you please read about how to add this to your question? See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thanks! – r2evans Aug 07 '20 at 15:14