1

I have a data set which has 2200 rows. I have to remove a large number of columns(e.g: around 400) at a time. This operations happens quite frequently and the columns to removed varies at each time. The columns to be removed will be in a text file.

This is how I approached solving this.

#Reading data
myData = read.csv("myDataFile.csv")

#Getting the column names which should be deleted
colToDelete = read.table("columnsToBeRemoved.txt")

#processing the names list
tempList = as.character(unlist(colToDelete))
cat(paste(shQuote(tempList, type="cmd"), collapse=","))

newDataSet = subset(myData, select = - ??)

I'm using cat(paste(shQuote(tempList, type="cmd"), collapse=",")) to get the list of names in a comma separated string. Output of this is

"04_ic_1306","06_iEC042_1314","13_iEcDH1_1363","18_iEcHS_1320","26_iEcolC_1368","31_iEcSMS35_1347","33_iECs_1301","34_iECUMN_1333","36_iEKO11_1354","39_iJO1366","47_iZ_1308","54_iSFxv_1172"

I've tried subset and data.table methods but I had no luck using either of the methods. I'm getting the below error. I'm failing at specifying the string to the select command.

Error in -a : invalid argument to unary operator

I was mainly referring to this previous stackoverflow question.

Community
  • 1
  • 1
SriniShine
  • 1,089
  • 5
  • 26
  • 46
  • 2
    Because you didn't give a reproducible example I have to guess, but `subset(d, select=setdiff(names(d), tempList)` might work. For a reproducible example see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – kasterma Jul 22 '15 at 09:21
  • Hey Kasterma, sorry about the previous comment. Your methods works perfectly fine. Thank you for the answer. – SriniShine Jul 22 '15 at 09:40

2 Answers2

1
b<- "04_ic_1306"
a[,paste(b)]<-NULL

Now to do this iteratively you may have to write a loop and save the files names in array like this

[1] "04_ic_1306"       "06_iEC042_1314"   "13_iEcDH1_1363"   "18_iEcHS_1320"   
[5] "26_iEcolC_1368"   "31_iEcSMS35_1347" "33_iECs_1301"     "34_iECUMN_1333"  
[9] "36_iEKO11_1354"   "39_iJO1366"       "47_iZ_1308"       "54_iSFxv_1172" 
Anuja Parikh
  • 53
  • 1
  • 14
1

This might be a solution for you:

# Create data frame with 5 columns
df <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10), e=rnorm(10))

# Select two columns to be removed
remove_col <- c("b", "d")

# Identify them in the column names
remove_col <- names(df) %in% remove_col

# Remove them using an inverse (the !) logical vector
df[,!remove_col]