0

I need to do the following:

  1. Scrape tweets using a third party api and store tweet content and other details in a mysql db.

  2. Before adding a new row in the db table, I need to fetch the temp of the place from where the tweet originated and add temp value in the table also.

To do the 2nd task, I wrote 2 python scripts. The first python script adds the mysql trigger to the database. From inside the mysql trigger, I used sys.eval() after reading http://crazytechthoughts.blogspot.sg/2011/12/call-external-program-from-mysql.html to call the second python script which fetches the temperature data using an api.

Both the scripts work perfectly fine individually but the trigger is not able to fetch the temperature data from the second script. How do I fix it?

Script to add trigger

        def create_Triggers(self):
        
        try:
     
            mysql_trig = """
            
            CREATE TRIGGER mysql_Trigger 
            BEFORE INSERT ON tweets FOR EACH ROW BEGIN 
            DECLARE loc CHAR(255); DECLARE result Double; DECLARE city CHAR(255); 
            DECLARE station CHAR(255); DECLARE dtime CHAR(255); SET city = NEW.city; 
            SET station =  "OYSN:9:YE"; SET dtime = NEW.dtime; 
            SET loc = CONCAT('python D:test case scripts/Weather_enrichment_triggers/weather_enrichment.py',city,station,dtime); 
            SET result = sys_eval(loc); 
            SET NEW.Temperature = result; 
            END;
            """
           
            self.curr.execute(mysql_trig)
            print("trigger executed")
            self.connection.commit()
            
            #self.curr.execute(postgre_trig)
            #self.connection.commit()
        
        except Exception as e:
            print(e)

    def test_triggers(self, query):

            self.curr.execute(query)
            self.connection.commit()

Script to fetch temp data(called from inside the trigger)

city  = sys.argv[1]
station = sys.argv[2]
dtime = sys.argv[3]

def weather_info(city,station,dtime):
        
#Get weather information for a given city and date
   template_request = "https://api.weather.com/v1/location/{station}/observations/historical.json?apiKey=apikey&units=m&startDate={start_date}&endDate={end_date}"
   df_header = ["City", "Year", "Month", "Day", "Hour", "Temperature(C)", "Condition"]

   def get_weather_data(city, year, month, day, station):
       start_date = "%d%02d%02d" % (year, month, day)
       end_date = "%d%02d%02d" % (year, month, day)
       request = template_request.format(station=station, start_date=start_date, end_date=end_date)
       request_data = json.loads(requests.get(request).content)
       weather_data = []
       last_dt = None
       for observation in request_data["observations"]:
           dt = datetime.fromtimestamp(observation["valid_time_gmt"]+3600)
           if last_dt and dt.hour > (last_dt.hour + 1):
               last_row = deepcopy(weather_data[-1])
               last_row[4] = last_row[4]+1
               weather_data.append(last_row)
           weather_data.append([city, year, month, dt.day, dt.hour, observation["temp"], observation["wx_phrase"]])
           last_dt = dt
       return weather_data
    
   dtime = datetime.strptime(dtime, '%Y-%m-%d %H:%M:%S%z')
   data = get_weather_data(city, dtime.year, dtime.month, dtime.day, station)

   weather_df = pd.DataFrame(data, columns=df_header).drop_duplicates(subset=["City", "Year", "Month", "Day", "Hour"])
   avg = (weather_df["Temperature(C)"].values).mean()
   weather_df = pd.DataFrame()
   return float(avg)



temp = weather_info(city, station, dtime)
temp = str(temp)
sys.stdout.write(temp)
O. Jones
  • 103,626
  • 17
  • 118
  • 172

1 Answers1

0

The sys_eval function returns only the exit code of the script it invokes.

It does not seem to offer a clean way to pass data back to the MySql server from the external script.

Plus, it's ridiculously insecure. Just saying.

Edit

A commonly accepted way to do this is with polling.

  1. have your first script, that inserts the data into your table, set tweets.Temperaure to NULL.

  2. get rid of the trigger and the sys_eval extension.

  3. put this index on that column.

    ALTER TABLE tweets ADD INDEX (Temperature, city)
    
  4. Get your second python script to run in a loop, and every few seconds do

    SELECT id, city 
      FROM tweets
     WHERE temperature IS NULL
    

    Then, for each row retrieved, have your script get the temperature for the city and do

    UPDATE tweets 
       SET Temperature = ###your_value###
     WHERE id = ###the_id_you_SELECTed###
    

The index makes those frequent SELECT statements very fast, especially when they return no rows. The trade off: your newest rows won't yet have temperatures.

Another alternative is the simplest one: get your first python program doing your INSERTs to look up the temperature for each tweet and INSERT it directly along with the rest of the row's data.

For what it's worth, very few applications use these MySQL extensions like sys_eval, and if you put your application on a production MySQL server it is unlikely it will have the extension.

O. Jones
  • 103,626
  • 17
  • 118
  • 172