0

I'm trying to insert a check step in my R script to determine if the structure of the CSV table I'm reading is as expected. See details: table.csv has the following colnames: [1] "A","B","C","D"

This file is generated by someone else, hence I'd like to make sure at beginning of my script that the colnames and the number/order of columns has not change.

I tried to do the following:

    #dataframes to import
    df_table <- read.csv('table.csv')

    #define correct structure of file
    Correct_Columns <- c('A','B','C','D')
    #read current structure of table
    Current_Columns <- colnames(df_table)

    #Check whether CSV was correctly imported from Source
    if(Current_Columns != Correct_Columns)

    {
    # if structure has changed, stop the script. 
    stop('Imported CSV has a different structure, please review export from Source.')
    } 
    #if not, continue with the rest of the script...

Thanks in advance for any help!

Mirko Gagliardi
  • 69
  • 1
  • 10
  • I think you want `if(any(Current_Columns != Correct_Columns))`. I assume you were getting some error before? It would be help to include the information about why your solution didn't work. – MrFlick Mar 05 '18 at 16:22

2 Answers2

1

Using base R, I suggest you take a look at all.equal(), identical() or any().

See the following example:

a <- c(1,2)
b <- c(1,2)
c <- c(1,2)
d <- c(1,2)
df <- data.frame(a,b,c,d)

names.df <- colnames(df)
names.check <- c("a","b","c","d")

!all.equal(names.df,names.check)
# [1] FALSE

!identical(names.df,names.check)
# [1] FALSE

any(names.df!=names.check)
# [1] FALSE

Following, your code could be modified as follows:

if(!all.equal(Current_Columns,Correct_Columns))
{
# call your stop statement here
} 

Your code probably throws a warning because Current_Columns!=Correct_Columns will compare all entries of the vector (i.e. running Current_Columns!=Correct_Columns on its own on the console will return a vector with TRUE/FALSE values).

Contrary, all.equal() or identical() will compare the whole vectors while treating them as objects.

For the sake of completeness, please be aware of the slight difference between all.equal() and identical(). In your case it doesn't matter which one you use but it can get important when dealing with numerical vectors. See here for more information.

JSN
  • 451
  • 4
  • 11
  • Thank you so much, it worked like a charm! Only issue now is that it won't work when I recall the script from a master script: > #### Run script #### > source('location/...[TRUNCATED] Error in eval(ei, envir) : Imported CSV has a different structure, please review export from Source. – Mirko Gagliardi Mar 06 '18 at 12:28
  • Almost forgot, if I run the very same script by itself, it works, and the IF instance does not trigger because file structure is actually the same. Only when I run it from the master script it fails the IF test... – Mirko Gagliardi Mar 06 '18 at 12:38
  • I guess the code of your master script would be necessary to see where the bug is. My first guess would be that your working directory is a different one when running the master script and therefore a different "table.csv" file is loaded. But thats no more than a wild guess at this point. – JSN Mar 07 '18 at 09:32
0

A quick way with data.table:

library(data.table)
DT <- fread("table.csv")
Correct_Columns <- c('A','B','C','D')
Current_Columns <- colnames(df_table)

Check if there is a false in pairwise matching:

if(F %in% Current_Columns == Correct_Columns){
  stop('Imported CSV has a different structure, please review export from Source.')
} 

}

C-x C-c
  • 1,261
  • 8
  • 20