0

I'm sure there is a trivial answer to this but I can't seem to find the right code. I have a list of files and a list of strings that I would like to assign the contents of those files to as dataframes. Then I would like to perform other things on the dataframes within the same loop. I also need to keep each dataframe for downstream work. here is my code:

samples <- c('fc14','g14','fc18','g18','fc21','g21')
fc_samples <- grep("fc", samples, value=TRUE)
fc_files <- c('fc14_g14_full_annot_uniq.txt','fc18_g18_full_annot_uniq.txt','fc21_g21_full_annot_uniq.txt')


# make dataframes
for (file in fc_files)
{   fc_n <- 1
    g_n <- 1
    print(file);

    # THE BIT THAT DOESN'T WORK
    assign(paste("data", fc_samples[fc_n], sep='_'), read.table(file,sep = "\t", header=T));

    # HERE I EXPECT THE TOP OF MY DF TO BE PRINTED BUT IT ISN'T
    head(data_fc14);

    # I TRY THIS INSTEAD
    do.call("<-",list(paste("data", fc_samples[fc_n], sep='_'), read.table(file,sep = "\t", header=T)))

    # I TRY TO PRINT THE DF AGAIN BUT STILL NO LUCK
    head(paste("data", fc_samples[fc_n], sep='_'))

    # FIRST DOWNSTREAM THING I WOULD LIKE TO DO,
    # WON'T WORK UNTIL I SOLVE THE DF ASSIGNMENT ISSUE
    names(paste("data", fc_samples[fc_n], sep='_'))[names(paste("data", fc_samples[fc_n], sep='_'))==c('SAMPLE_fc','CHROM_fc','START_fc','REF_fc','ALT_fc','REGION_fc','DP_fc','FREQ_fc','GENE_fc','AFFECTS_fc','dbSNP_fc',
    #                           'NOVEL_fc')] <- c('SAMPLE','CHROM','START','REF','ALT','REGION','DP','FREQ','GENE','AFFECTS','dbSNP','NOVEL')

    # ITERATE TO THE NEXT FILE
    fc_n <- fc_n+1
}

I tried solutions from here and here but it didn't help. If anyone has an elegant solution to this then that would be great! Thanks in advance!

Community
  • 1
  • 1
user3062260
  • 1,584
  • 4
  • 25
  • 53
  • Doing `head` in a loop will not print to the console. You have to explicitly `print` it. – James Feb 01 '17 at 12:07
  • Is it just not printing, or is the object not being created? – Benjamin Feb 01 '17 at 12:13
  • Thanks for your responses, so I added a print statement around the head statement and now it prints what I expect, however I can't refer to the object in a procedure in the loop. I get this error: Error in names(paste("data", fc_samples[fc_n], sep = "_"))[names(paste("data", : target of assignment expands to non-language object – user3062260 Feb 01 '17 at 12:39
  • basically I need a solution to refer to the object that has been created as part of the loop that doesn't involve directly calling it 'data_fc14' – user3062260 Feb 01 '17 at 12:43
  • `fc_n <- 1` should be outside the loop. Your `assign()` seems to work for me. I think what your looking for is `get("data_fc_14")`. Though it won't work on the lefthand side of a `names(get("data_fc_14")) <- ...` expression. You'll have to copy it , modify its names, then reaffect it. – Aurèle Feb 01 '17 at 14:20
  • But really, all this is not considered best practice, and generally frowned upon. Since you're asking for the "elegant" way, you should work with one list of all your dataframes. See https://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f – Aurèle Feb 01 '17 at 14:21
  • Thanks for this, I got there in the end using get(), assign() and eval(), it seemed the most strait forward approach for more novice users like myself – user3062260 Feb 01 '17 at 16:47

2 Answers2

2

Fixing your code:

samples <- c('fc14','g14','fc18','g18','fc21','g21')
fc_samples <- grep("fc", samples, value=TRUE)

# Make dummy example files
fc_files <- file.path("example-data", c(
  'fc14_g14_full_annot_uniq.txt','fc18_g18_full_annot_uniq.txt',
  'fc21_g21_full_annot_uniq.txt'))
set.seed(123) ; dummy_df <- 
  setNames(
    as.data.frame(replicate(12, rnorm(7))),
    c('SAMPLE_fc','CHROM_fc','START_fc','REF_fc','ALT_fc','REGION_fc',
      'DP_fc','FREQ_fc','GENE_fc','AFFECTS_fc','dbSNP_fc','NOVEL_fc')
  )
if (!dir.exists("./example-data")) dir.create("example-data")
invisible({
  lapply(fc_files, write.table, x = dummy_df, sep = "\t")
})

# "fc_n <- 1" should be outside the loop:
fc_n <- 1
for (file in fc_files) {
  g_n <- 1
  assign(paste("data", fc_samples[fc_n], sep='_'), 
         read.table(file,sep = "\t", header=T))
  # Copy data to be able to change its names
  f <- get(paste("data", fc_samples[fc_n], sep='_'))
  names(f)[names(f) == c('SAMPLE_fc','CHROM_fc','START_fc',
                         'REF_fc','ALT_fc','REGION_fc',
                         'DP_fc','FREQ_fc','GENE_fc','AFFECTS_fc',
                         'dbSNP_fc','NOVEL_fc')] <- 
    c('SAMPLE','CHROM','START','REF','ALT','REGION','DP','FREQ',
      'GENE','AFFECTS','dbSNP','NOVEL')
  # Assign it back, now that names have been changed
  assign(paste("data", fc_samples[fc_n], sep='_'), f)
  fc_n <- fc_n+1
}

A "more elegant" way:
assign()ing is not considered best practice, rather work with lists.
Though I occasionally use it myself, there are sometimes good reasons to.

# For the '%>%' pipe
library(magrittr)

data <-
  samples %>% 
  grep(pattern = "fc", value = TRUE) %>% 
  setNames(nm = .) %>% 
  lapply(grep, x = fc_files, value = TRUE) %>% 
  lapply(read.table, sep = "\t", header = TRUE) %>% 
  lapply(function(f) setNames(f, sub("_fc", "", names(f))))

identical(data_fc14, data$fc14)
# [1] TRUE
identical(data_fc18, data$fc18)
# [1] TRUE
identical(data_fc21, data$fc21)
# [1] TRUE

# Clean up
print(unlink("example-data", recursive = TRUE))
Aurèle
  • 12,545
  • 1
  • 31
  • 49
  • This looks like a really nice solution with the shortest code so I've accepted this answer, it may take me a while to figure out what all the parts are doing though but thank you – user3062260 Feb 01 '17 at 16:52
0
samples <- c('fc14','g14','fc18','g18','fc21','g21')
fc_samples <- grep("fc", samples, value=TRUE)
fc_files <- c('fc14_g14_full_annot_uniq.txt','fc18_g18_full_annot_uniq.txt','fc21_g21_full_annot_uniq.txt')
g_files <- c('g14_full_annot_uniq.txt','g18_full_annot_uniq.txt','g21_full_annot_uniq.txt')

# make dataframes
df_names <- c("data_fc14","data_fc18","data_fc21")
fc_n <- 1
for (file in fc_files)
{   

    assign(df_names[fc_n], read.table(file,sep = "\t", header=T)); #WORKS
    #do.call("<-",list(paste("data", fc_samples[fc_n], sep='_'), read.table(file,sep = "\t", header=T))); #ALSO WORKS

    print(head(df_names[fc_n])) 
    print(head(eval(as.symbol(df_names[fc_n]))))

    df <- eval(as.symbol(df_names[fc_n]))

    names(df)[names(df) == c('SAMPLE_fc','CHROM_fc','START_fc','REF_fc','ALT_fc','REGION_fc','DP_fc','FREQ_fc','GENE_fc','AFFECTS_fc','dbSNP_fc',
                                'NOVEL_fc')] <- c('SAMPLE','CHROM','START','REF','ALT','REGION','DP','FREQ','GENE','AFFECTS','dbSNP','NOVEL')

    assign(df_names[fc_n], df)
    print(head(eval(as.symbol(df_names[fc_n]))))
    print(file);
    fc_n <- fc_n+1
}

Thanks to all that helped, I solved it using the advise from "apom" in the end as it is most intuitive for more novice R users.

user3062260
  • 1,584
  • 4
  • 25
  • 53