Analyzing tweets in RStudio:
My csv file contains 4,000,000 tweets with five columns: screen_name, text, created_at, favorite_count, and retweet_count.
I am trying to identify the frequency of hashtags using the following codes, however it runs too slowly for several days and sometimes RStudio crashes.
mydata %>%
unnest_tokens(word, text, token ="tweets") %>%
anti_join(stop_words, by= "word")
I have used other approaches to handle big data in R such as: https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/ or https://spark.rstudio.com/guides/textmining/ and Spark library: https://spark.rstudio.com/guides/textmining/. None of them work well for me.
In Spark, I do the following, but RStudio is not able to copy my dataset to Spark. I see that "Spark is Running" in my RStudio for even one day without copying my dataset to Spark.
Connect to your Spark cluster:
spark_conn <- spark_connect("local")
Copy track_metadata to Spark:
track_metadata_tbl <- copy_to(spark_conn, my_database)
Do you have any suggestions/instructions/links that would help me analyze my data?
My laptop is a Mac Processor: 2.9 GHz Dual-Core Intel Core i5 Memory: 8 GB 2133 MHz LPDDR3