Filtering data table based on condition in Column

Question

I am trying to download the EOD data from NSE site. The data consist of all type of EQ or BE or DR or N1 and etc. Now I want to filter the table according to EQ and BE and DR only and exclude other fields in Col "Series."

Data Structure after reading and writing is like this

      DATE SERIES     SYMBOL     OPEN     HIGH      LOW    CLOSE   VOLUME
1    2016-05-27     EQ  20MICRONS    28.30    29.20    28.05    28.25    31468
2    2016-05-27     EQ 3IINFOTECH     4.20     4.25     3.90     3.95  2209977
3    2016-05-27     EQ    3MINDIA 13170.00 13300.00 12611.00 12699.00     5511
4    2016-05-27     EQ    8KMILES  1717.00  1770.95  1685.00  1710.45    33558
5    2016-05-27     EQ   A2ZINFRA    24.80    25.65    24.70    25.15   102189
6    2016-05-27     EQ AARTIDRUGS   458.05   473.85   458.05   468.95    11140
7    2016-05-27     EQ   AARTIIND   512.60   519.95   512.20   516.20    13101
8    2016-05-27     EQ  AARVEEDEN    58.00    59.00    57.20    58.55     3436
9    2016-05-27     EQ       ABAN   198.55   202.50   198.50   199.55   999288
10   2016-05-27     EQ        ABB  1241.80  1273.85  1234.40  1253.95    51180
11   2016-05-27     EQ ABBOTINDIA  4703.00  4764.00  4639.70  4751.70     2663
12   2016-05-27     EQ      ABFRL   137.80   141.00   133.50   134.50   541872

Tried to use which command but only returning the EQ series

the code used is

#28-10-2014: Fix for '403 Forbidden'
## Credit http://stackoverflow.com/questions/26086868/error-downloading-a-csv-in-zip-from-website-with-get-in-r

library(httr)

#Define Working Directory, where files would be saved
setwd('D:/FII Stats/')

Define start and end dates, and convert them into date format
startDate = as.Date("2016-05-26", order="ymd")
endDate =   as.Date("2016-05-27", order="ymd")

#work with date, month, year for which data has to be extracted
myDate = startDate
zippedFile <- tempfile() 

while (myDate <= endDate){
  filenameDate = paste(as.character(myDate, "%y%m%d"), ".csv", sep = "")
 monthfilename=paste(as.character(myDate, "%y%m"),".csv", sep = "")
 downloadfilename=paste("cm", toupper(as.character(myDate, "%d%b%Y")), "bhav.csv", sep = "")
 temp =""

  #Generate URL
 myURL = paste("http://www.nseindia.com/content/historical/EQUITIES/", as.character(myDate, "%Y"), "/", toupper(as.character(myDate, "%b")), "/", downloadfilename, ".zip", sep = "")

  #retrieve Zipped file
  tryCatch({
  #Download Zipped File

#28-10-2014: Fix for '403 Forbidden'
  #download.file(myURL,zippedFile, quiet=TRUE, mode="wb",cacheOK=TRUE)
  GET(myURL, user_agent("Mozilla/5.0"), write_disk(paste(downloadfilename,".zip",sep="")))


  #Unzip file and save it in temp 
  #28-10-2014: Fix for '403 Forbidden'
  temp <- read.csv(unzip(paste(downloadfilename,".zip",sep="")), sep = ",",as.is=TRUE) 

  #temp <-  temp[which(temp$SERIES=="EQ" | "DR" | "BE"), ]


  #Rename Columns Volume and Date
  colnames(temp)[9] <- "VOLUME"
  colnames(temp)[11] <- "DATE"

  #Define Date format
  temp$DATE <- as.Date(temp$DATE, format="%d-%b-%Y")

  #Reorder Columns and Select relevant columns
   temp<-subset(temp,select=c("DATE","SERIES","SYMBOL","OPEN","HIGH","LOW","CLOSE","VOLUME"))
   #temp<-subset(temp,temp[temp$"SERIES" == "BE & DR & EQ", ],select=c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME"))

  #Write the BHAVCOPY csv - datewise
  write.csv(temp,file=filenameDate,row.names = FALSE)

  #Write the csv in Monthly file
  if (file.exists(monthfilename))
  {
   write.table(temp,file=monthfilename,sep=",", eol="\n", row.names = FALSE, col.names = FALSE, append=TRUE)
  }else
  {
   write.table(temp,file=monthfilename,sep=",", eol="\n", row.names = FALSE, col.names = TRUE, append=FALSE)
  }


  #Print Progress
  #print(paste (myDate, "-Done!", endDate-myDate, "left"))
 }, error=function(err){
  #print(paste(myDate, "-No Record"))
 }
 )
  myDate <- myDate+1
  print(paste(myDate, "Next Record"))
}

 #Delete temp file - Bhavcopy
 junk <- dir(pattern="cm")
 file.remove(junk)

How to get the desired result?

score 5 · Accepted Answer · edited Jan 25 '18 at 15:18

Use %in% rather than "==". You cannot use x == A | B but you can use x %in% c("A","B"). And do not use subset if you are choosing to use "[". That's an either-or sort of choice:

temp <- temp[ temp$"SERIES" %in% c("BE",  "DR", "EQ") ,   # row selection rule
             c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME") ] #col select

Or use subset this way:

temp<-subset(temp,   SERIES %in% c("BE",  "DR", EQ"),   # NSE , so use unquoted colname
               select=c("DATE","SYMBOL", "OPEN", "HIGH", "LOW", "CLOSE", "LAST", "VOLUME"))

Probably better to use the "[" function if you plan on doing any programming with R. The NSE (look it up if you don't know what the acronym means) in subset is source of ongoing errors. Safest of all would avoid the use of '$' as well:

temp <- temp[ temp[["SERIES"]] %in% c("BE,  "DR", EQ") ,   # row selection rule
             c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME") ] # col select

Used the 1st code snippet and the result is coming as desired. Put the quotation mark with BE and EQ or result will not be proper. Thanks — sr123, May 29 '16 at 20:03

score 2 · Answer 2 · answered May 29 '16 at 20:30

2

This will do the work:

library(data.table)

output <- setDT(df)[SERIES %in% c("EQ", "BE", "DR") ]

answered May 29 '16 at 20:30

rafa.pereira

13,251
6
71
109

Filtering data table based on condition in Column

2 Answers2