-1

Hello I need to get my ggplot with date format having this format in X axis:

final outlook.

But my date format has time with it.

sentiment_bing1 <- tidy_trump_tweets %>% 
  inner_join(get_sentiments("bing")) %>% 
  count(word, created_at, sentiment) %>% 
  ungroup()
p <- sentiment_bing1 %>% filter(sentiment == "positive") %>% ggplot(aes(x=created_at, y = n)) + 
  geom_line(stat="identity", position = "identity", color = "Blue") +  scale_x_date(date_breaks ='3 months', date_labels = '%b-%Y') + stat_smooth() + theme_gdocs() +
  xlab("Date") + ylab("Normalized Frequency of Positive Words in Trup's Tweets")

1              abound 11/30/17 13:05  positive 0.0
2               abuse  1/11/18 12:33  negative 0.0
3               abuse  10/27/17 1:18  negative 0.0
4               abuse  2/18/18 17:10  negative 0.0

This is what I have done to get the result. Now how do I achieve it like the picture? Conversion to date doesn't help as there are instances where the tweet takes place on same day but different time and that then messes the graph.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205

2 Answers2

0

Welcome to SO!

It's hard to answer your question without seeing the data you are using and the error that your code is generating. Next time try and create a reproducible question. This will make it easier for someone to identify where your problem lies.

Based on the code and data you've provided I've created a sample data set with a (broadly) similar structure to that from the chart...

library(lubridate)
library(ggplot2)
library(ggthemes)

set.seed(100)
start_date <- mdy_hm("03-01-2017-12:00")
end_date <- mdy_hm("03-01-2018-12:00")
number_hours <- interval(start_date, end_date)/hours(1)

created_at <- start_date + hours(6:number_hours)
length(created_at)
word <- sample(c("abound", "abuse"), size = length(created_at), replace = TRUE, 
    prob=c(0.25, 0.75))

Your plotting code looks good. I could be wrong here, but from what I can tell your problem could lie in the way you are summarising the frequencies. In the code below, I've used the lubridate package to group you data by dates (day), allowing for a daily frequency count.

test_plot <- data_frame(created_at, word) %>%
   mutate(sentiment = 
       case_when(
         word == "abound" ~ "positive",
         word == "abuse" ~ "negative")) %>%
   filter(sentiment == "positive") %>% 
   mutate(created_at = date(round_date(ymd_hms(created_at), unit = "day"))) %>%
   group_by(created_at) %>%
   tally() %>%
   ggplot() +
     aes(x = created_at, y = n) + 
     geom_line(stat="identity", position = "identity", color = "Blue") +  
     geom_smooth() +
     scale_x_date(date_breaks ='3 months', date_labels = '%b-%Y') + 
     theme_gdocs() +
     xlab("Date") + 
     ylab("Frequency of Positive Words in Trump's Tweets")

Which gives you this...

enter image description here

vengefulsealion
  • 756
  • 11
  • 18
-1
sentiment_bing1 <- tidy_trump_tweets %>% 
  inner_join(get_sentiments("bing")) %>% 
  count(created_at, sentiment) %>% 
  spread(sentiment, n, fill=0) %>%
  mutate(N = (sentiment_bing1$negative - min(sentiment_bing1$negative)) / (max(sentiment_bing1$negative) - min(sentiment_bing1$negative))) %>%
  mutate(P = (sentiment_bing1$positive - min(sentiment_bing1$positive)) / (max(sentiment_bing1$positive) - min(sentiment_bing1$positive))) %>%
  ungroup
sentiment_bing1$created_at <- as.Date(sentiment_bing1$created_at, "%m/%d/%y")

The use of spread helped in separating the positive and negative and then in normalization to get the result I wasa looking for!