Original Post
I am trying to gather some tweets using the {rtweet}
package and store them in a MySQL database. But I am getting errors whenever I try to upload a tweet that contains emojis.
Here is my code:
# loading packages
library(rtweet)
library(DBI)
library(RMariaDB)
library(dplyr)
library(dbplyr)
library(lubridate)
# create twitter api token
twitterToken <- rtweet_bot(
api_key = "*****",
api_secret = "*****",
access_token = "*****",
access_secret = "*****"
)
# search tweets
tweets <- search_tweets(q = "beautiful", n = 500, type = "recent", include_rts = FALSE, token = twitterToken)
tweets$Topic <- "beautiful"
tweets$created_at <- parse_date_time(tweets$created_at, orders = c("%a %b %d %T %z %Y", "%Y-%m-%d %H:%M:%S"))
tweets$screen_name <- users_data(tweets)$screen_name
tweets$status_url <- paste0("https://twitter.com/", tweets$screen_name, "/status/", tweets$id_str)
tweets <- tweets %>% select(Topic, status_url, screen_name, created_at, text, favorite_count, retweet_count)
# upload to database
con <- dbConnect(MariaDB(), dbname="dbname", username="username", password="password", host="db_host", port=db_port, ssl.ca = "ssl.pem", load_data_local_infile = TRUE)
dbWriteTable(con, "testTweetsDB", tweets, overwrite = T)
This throws the following error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"MariaDBConnection", "character", "tweets"’
This is a bit odd because it used to work before I upgraded my R version and updated all the packages. But is not the main issue. I can work around this by the following codes:
tweets <- as.data.frame(tweets)
dbWriteTable(con, "testTweetsDB", tweets, overwrite = T)
This time I get the following error:
Error: Error executing query: Invalid utf8 character string: '@Alyri_tv So beautifull girl '
The string it complains about is the first tweet that contains emojis. It works perfectly fine if I only select tweets that don't have any emojis in them. It works even if the tweets contain Chinese, Korean and other language characters. It is the emojis that are causing the problem.
The default collation for the database is utf8mb4_unicode_ci
Update 1
Here is the output from dput(tweets[1:10,])
structure(list(Topic = c("beautiful", "beautiful", "beautiful",
"beautiful", "beautiful", "beautiful", "beautiful", "beautiful",
"beautiful", "beautiful"), status_url = c("https://twitter.com/supyo192/status/1623573544645926917",
"https://twitter.com/Ghana_Ronaldo/status/1623573542267756545",
"https://twitter.com/ValentinoAndy77/status/1623573539906281479",
"https://twitter.com/Gino15618462/status/1623573536550838273",
"https://twitter.com/kaylenicodemus/status/1623573533468000256",
"https://twitter.com/beace_tw/status/1623573527688347651", "https://twitter.com/JoyceNwanochi/status/1623573525968691200",
"https://twitter.com/Adasu_d_gr8/status/1623573525742120961",
"https://twitter.com/AsadAli62560407/status/1623573523070390276",
"https://twitter.com/HidingWolfe/status/1623573516313407488"),
screen_name = c("supyo192", "Ghana_Ronaldo", "ValentinoAndy77",
"Gino15618462", "kaylenicodemus", "beace_tw", "JoyceNwanochi",
"Adasu_d_gr8", "AsadAli62560407", "HidingWolfe"), created_at = structure(c(1675946672,
1675946671, 1675946671, 1675946670, 1675946669, 1675946668,
1675946667, 1675946667, 1675946667, 1675946665), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), text = c("@meeshell___ THESE ARE BEAUTIFUL!! ❤\nLove! Love! & Good Vibes to you!! ",
"Good morning beautiful and handsome people , pls how are u doing today ?? ",
"@geegeebanks Good morning have a beautiful and happy Thursday ❤ ",
"@hossain_mita \"I learned that in feelings we are guided by mysterious laws, perhaps fate or perhaps a mirage, in any case something inexplicable. Because... after all.. there is never a reason why you fall in love. It happens and Enough.\" Good morning dear beautiful soul",
"hings are beautiful if you love them\n\n>>>>>>>>S50<<<<<<<\n>>>>>>S50<<<<<<<\n>>>>>>S50<<<<<<<<<",
"子育てママはなぜ燃える!? #ナイトブラ",
"@isaaczara_ This looks beautiful! \nWhole I'm not a designer, I've been doing some work in Canva these past months, and recently stumbled upon the \"Intro rust\" font. I love how sleek it it!\n\nMaybe some day it can feature in your series.\n\nThese are equally beautiful too.",
"Because she is extremely beautiful, gorgeous and charming does not mean you should over look her .\n\nThose red flags will swallow up everything you are seeing.",
"@farhat121212 Beautiful picture", "@MistressSnowPhD Now that is some beautiful ink. I’m sure he’d love it."
), favorite_count = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), retweet_count = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L)), row.names = c(NA, 10L), class = "data.frame")
If I try dbWriteTable(con, "testTweetsDB", tweets[c(5,6,7,9,10),], overwrite = T)
, it works fine.
If I try dbWriteTable(con, "testTweetsDB", tweets[-c(5,6,7,9,10),], overwrite = T)
, gives me the error Error: Error executing query: Invalid utf8 character string: '@meeshell___ THESE ARE BEAUTIFUL!! '
Also, the table testTweetsDB
does not exist on the database. I am relying on the dbWriteTable()
function to create the table.