-2

I have a dataset as CSV with three columns:

  • timestamp (e.g. 2018/12/15)
  • keyword (e.g. "hello")
  • count (e.g. 7)

I want one plot where all the lines of the same keyword are connected with each other and timestamp is on the X- and count is on the Y- axis. I would like each keyword to have a different color for its line and the line being labeled with the keyword.

The CSV has only ~30.000 rows and R runs on a dedicated machine. Performance can be ignored.

I tried various approaches with mathplot and ggplot in this forum, but didn't get it to work with my own data.

What is the easiest solution to do this in R?

Thanks!

EDIT:

I tried customizing Romans code and tried the following:

`csvdata <- read.csv("c:/mydataset.csv", header=TRUE, sep=",")  

time <- csvdata$timestamp  
count <- csvdata$count  
keyword <- csvdata$keyword  

time <- rep(time)  
xy <- data.frame(time, word = c(keyword), count, lambda = 5)  

library(ggplot2)  

ggplot(xy, aes(x = time, y = count, color = keyword)) +  
  theme_bw() +  
  scale_color_brewer(palette = "Set1") +  # choose appropriate palette  
  geom_line()`

This creates a correct canvas, but no points/lines in it...

DATA:

head(csvdata)

keyword count  timestamp
1 non-distinct-word     3 2018/08/09
2 non-distinct-word     2 2018/08/10
3 non-distinct-word     3 2018/08/11

str(csvdata)

'data.frame':   121 obs. of  3 variables:
 $ keyword  : Factor w/ 10 levels "non-distinct-word",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ count    : int  3 2 3 1 6 6 2 3 2 1 ...
 $ timestamp: Factor w/ 103 levels "2018/08/09","2018/08/10",..: 1 2 3 4 5 6 7 8 9 10 ...
to_the_nth
  • 61
  • 11
  • 3
    I suggest you provide a *reproducible* question. This includes sample code (including listing non-base R packages) and sample data (e.g., `dput(head(x))`). Showing code you've tried and stating why they are not correct is a very good step. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Dec 16 '18 at 18:32
  • Have you considered coercing the `timestamp` to a proper `Date` object? Try something like `as.Date(as.character(csvdata$timestamp), format = "%Y-%m-%d)`. – Roman Luštrik Dec 18 '18 at 13:04
  • If I do that, it throws `Error in seq.int(0, to0 - from, by) : 'to' cannot be NA, NaN or infinite`. My current workaround is that i convert to as.Numeric, which works, but does not display the timestamp axis correctly (just showing the numbers), so I screenshot it and add the axis in photoshop... – to_the_nth Dec 19 '18 at 21:49

1 Answers1

0

Something like this?

# Generate some data. This is the part poster of the question normally provides.
today <- as.Date(Sys.time())
time <- rep(seq.Date(from = today, to = today + 30, by = "day"), each = 2)
xy <- data.frame(time, word = c("hello", "world"), count = rpois(length(time), lambda = 5))

library(ggplot2)

ggplot(xy, aes(x = time, y = count, color = word)) +
  theme_bw() +
  scale_color_brewer(palette = "Set1") +  # choose appropriate palette
  geom_line()

enter image description here

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Roman, thanks so much for your reply: I know this is probably a noob question. I load the data from a CSV and tried it with your code. (I couldn't add it in the comment, so I updated the original question). I loads the canvas, but does not fill in any points or lines. Is there something obviously wrong with my approach? Again thanks, I know your time is very valuable! – to_the_nth Dec 16 '18 at 22:03
  • @to_the_nth please show what your data looks like. You can use `head` and `str` functions. – Roman Luštrik Dec 17 '18 at 14:03