1

I would like to run this script multiple times(and do a few other things after that). The data is in a text file (named test.txt) in the following form:

A   B   C   D   E
1   2   2   1   9
3   5   1   3   0
2   NA  4   13  2

and is imported using

test <- read.table("test.txt",header=TRUE)

The data can be converted to a different format and it can be used without headers.

I know I should use an apply function, and I Googled a lot about using both apply functions and for loops, but I wasn't able to implement them successfully.

For example, I get an error message after running the following code:

for(i in names(table)){
  message("Name of the data set:", i)
  outlierKD(table, i)}

Error in eval(expr, envir, enclos) : object 'i' not found`.

I found a discussion here about the loop's index and also discovered that exists(i) returns false while the message appears properly.

I would like to execute the outlier function that checks for outliers in all columns of the data either using apply functions or loops.

Community
  • 1
  • 1
zoli
  • 11
  • 4

3 Answers3

0

You could try something like this:

source("https://datascienceplus.com/rscript/outlier.R")

A = c(1, 3, 2, 2, 3)
B = c(2, 5, 0, 1, 6)
C = c(2, 1, 4, 7, 8)
D = c(1, 3, 99, 4, 2)
E = c(9, 0, 2, 8, 4)

df = data.frame(A, B, C, D, E)

x <- 0

for (i in df) {
  x <- x + 1
  names <- names(df)
  message("Variable: ", names[x])
  outlierKD(dt = df, var = i)
}

Hope it helps!

Samuel
  • 2,895
  • 4
  • 30
  • 45
  • For some reason when the script has to remove the outlier, the original column remains unchanged and a new column named `i` is created, where the outlier is removed. – zoli Feb 17 '17 at 21:55
0

Use of for-loop is easier in this case.

test <- read.table("test.txt", header=TRUE) #copied and pasted
test

source("https://datascienceplus.com/rscript/outlier.R") #function

for(i in 1:ncol(test)) outlierKD(dt=test, i)

Then in the R console (interactively), press key Y to reveal plots

David C.
  • 1,974
  • 2
  • 19
  • 29
  • Unfortunately, the script doesn't "see" the data this way: `List of outliers: 0 from 1 observations Proportion (%) of outliers: 0 Mean of the outliers: NaN Mean without removing outliers: 1 Mean if we remove outliers: 1` – zoli Feb 17 '17 at 21:51
0

(I wanted to modify the script a bit, and store it in an .r file as well the script that will execute it multiple times(and does some other things), like this:)

test <- read.table("test.txt",header=TRUE)
source("outlierKD_mod.r")
source("loopscript.r")
loopscript(test)

This didn't work, and I started creating a single script (where no functions are declared) based on @Samuel's code which can be copy-pasted into the R console. The only thing that had to be modified on the outlierKD script is that this line had to be replaced:

assign(as.character(as.list(match.call())$test), test, envir = .GlobalEnv)

with this:

test[x]=var_name

This command will remove the column added to the data frame by the outlier check:

test <- subset(test, select = -c(i) )
Community
  • 1
  • 1
zoli
  • 11
  • 4