0

I have a lot of tsv files that I want to upload in R. I do not want to merge them. I made a list of all files and put it in a loop to upload them. It works mostly, except that some of my files do have empty places in the first row of the last column which makes read_tsv to drop them. In the past I have simply edited the files to add - to resolve this issue but this time it is not practical as I do have hundreds of files to deal with. Can you help me with this.

library(readr)
public_folder <- "C:/Users/eulus/Desktop/test/interproscan/public_results/"       
public_file_list <- list.files(path=public_folder, pattern="*.tsv")  
for (i in 1:length(public_file_list)){assign(public_file_list[i],read_tsv(paste(public_folder, public_file_list[i], sep=''), col_names = F, skip_empty_rows = F))}

I have tried to define column names and add column types but this has not solved my issue and frankly I am at loss here. Column types details posted by R is down here that shows the problem column 14 is simply dropped in the next file.

-- Column specification ----------------------------------------------------------
Delimiter: "\t"
chr (10): X1, X2, X4, X5, X6, X9, X11, X12, X13, X14
dbl  (3): X3, X7, X8
lgl  (1): X10

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 266 Columns: 13                                                                                                
-- Column specification ----------------------------------------------------------
Delimiter: "\t"
chr (9): X1, X2, X4, X5, X6, X9, X11, X12, X13
dbl (3): X3, X7, X8
lgl (1): X10

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.

I have received some warnings. down below every single warning is the same so I am not posting every warning here.

There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
2: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat) 
EDUlusman
  • 3
  • 2
  • (1) We can't help with code to import if we have _zero_ idea what the data looks like. Please provide the raw file contents (first 4-5 rows perhaps) of 2-3 different files that are causing problems. (2) `assign` is a sign of fragile and inefficient code, I suggest `alldat <- lapply(setNames(nm=public_file_list), read_tsv)` to have a list of frames, as a named list (where the name is the filename). See [list of tables](https://stackoverflow.com/a/24376207/3358227) for more discussion for this and related purposes. – r2evans Feb 13 '23 at 14:15

1 Answers1

0

Pretty hard to answer this as is, but I doubt the solution is a for loop. I think you'll want something along the lines of:

public_file_list <- list.files(path=public_folder, pattern="*.tsv")  
raw_data <- lapply(public_file_list,read_tsv)

Or maybe

raw_data <- lapply(public_file_list,read_tsv, col_names = F, skip_empty_rows = F))