0

I have a difficult question I was hoping you friendly folks can help me with.

I have a dataset (df1) that includes variable company id, day, day-3,day+3 , Each id represents a company ticker symbol.

dataset (news) includes variable company id, day,newstitle,weight

I want to create two variables event1 and event2 in dataset(df1)

event1 =it counts events of newstile from dataset(news) between the time range[df1$range-3,df2$range+3] with news$newstitle contains "order" for each df$id

event2 = it counts events of newstile between the time range[df1$range-3,df2$range+3] with news$newstitle contains "dividend" for each df@id

I have rewrite my question for better understanding. I really appreciate your help.

Best.

#this would be desired result with new vars event1,event2 
out <- read.table(text="
                   id,date,date_bef3,date_aft3,event1,event2        
                   1605,1992/12/15,1992/12/12,1992/12/18,0,0
                   1705,1992/12/30,1992/12/27,1993/1/2,1,1
                   3412,1992/12/31,1992/12/28,1993/1/3,0,0  
                   9921,1993/1/7,1993/1/4,1993/1/10,0,0
                   2314,1993/1/18,1993/1/15,1993/1/21,1,0",
                   header=T,sep=",")

#this is index data
df1<-out[,-c(5,6)]

#this is simulated news source data
news<- read.table(text="
                  id,date,newstitle,weight  
                  2543,1992/12/30,new order,1
                  1705,1992/12/29,dividend payment,1
                  1705,1993/1/1,new order,1
                  9921,1993/1/1,new product,1
                  2314,1993/1/16,new order,1",
                  header=T,sep=",")

it is like the sql code like that:

proc sql;select *
  sum (case when news$newstitle="order" ) as event1,
  sum (case when news$newstitle="dividend" ) as event2,
from df1,new
where news$date between df$date_bf3 and df$date_af3 ,
news$id=df$id

my initial procs:

1: inner merge df1 and news by "id"

2: set dummy event1=1 if newdate between(date_bef3,date_aft3) and newstitle contain "order" set dummy event2=1 if newdate between(date_bef3,date_aft3) and newstitle contain "dividend"

3: collapse (sum)event1 event2 by(id,date)

setDT(df1,key="id")
setDT(news,key="id")
%inner merge
df<-df1[news,]

%set dummy event1, something wrong here 
df[newstitle=="order",event1:=as.numeric(between(newsdate,date_bef3,date_aft3))]
changjx
  • 353
  • 2
  • 8
  • First, provide a data set where dates are given as dates, along with a table representing desired output for the example. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 for guidance. I don't think there's any need to include the +/-3 days columns. – Frank Jul 22 '17 at 15:24
  • I wrote an answer but I must be confused (?). It sounded like you wanted some sort of matching based on the text, but there's only 1 column of text, so what would I be matching it against? – Hack-R Jul 22 '17 at 15:31

1 Answers1

0

Based on your comment below, I believe that what you want is:

# Make dates actual dates instead of factors
library(lubridate)
df1$date_bef3 <- ymd(df1$date_bef3)
df1$date_aft3 <- ymd(df1$date_aft3)
news$date     <- ymd(news$date)

event1 = sum(news$newstitle[news$id == df1$id & df1$date_bef3 <= news$date & df1$date_aft3 >= news$date]=="new order")
event2 = sum(news$newstitle[news$id == df1$id & df1$date_bef3 <= news$date & df1$date_aft3 >= news$date]=="dividend payment")
Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • it likes the code like=> proc sql;select * sum (case when news$newstitle="order" ) as event1, sum (case when news$newstitle="dividend" ) as event2, from df1,new where news$date between df$date_bf3 and df$date_af3 , news$id=df$id – changjx Jul 22 '17 at 16:03
  • @changjx OK, I updated my answer. Also, note that you can use SQL statements on R objects by loading the `sqldf` package, though I don't think this SQL is valid. – Hack-R Jul 22 '17 at 17:37