Mapping sample data to actual csv data

Question

Thanks to Davy and everyone I think I made progress. It still will not produce a line graph but there is nothing logically in the code that looks wrong. I take no credit here - I just cut and paste what smarter people than me have figured out but I still don't get a graph. Link to github csv at the end.

data = read.csv("C:/Users/12083/Desktop/librarydata.csv") # Read the data into R

head(data)                                            # Quality control, looks good
str(data)
data$dates = as.Date(data$dates, format = "%d/%m/%Y") # This formats the date as dates for R
library(tidyverse)                                    # This will import some functions that you need, spcifically %>% and ggplot
# Step 0: look that the data makes sense to you
summary(data$dates)
summary(data$city)

# Step 1: filter the right data
start.date = as.Date("2003-01-02")
end.date   = as.Date("2010-05-04")

filtered = data %>% 
  filter(dates >= start.date & 
           dates <= end.date) # This will only take rows between those dates
summary(filtered)
colnames(filtered)

library(dplyr)

filtered_agg <- filtered %>%
  group_by(city, dates, Location) %>%
  summarize(location_sum=n()) 

filtered_agg
summary(filtered_agg)
# Step 2: Plotting
# Now you can create the plot with ggplot:
# Notes: 
# I added geom_point() so that each X value gets a point. 
# I think it's easier to read. You can remove this if you like
# Also added color, because I like it, feel free to delete



# The problem is in here - somewhere
Plot = ggplot(filtered_agg, aes(x=dates, y=Location, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city))
Plot
dput

https://github.com/karl1776/chart colnames(filtered) 1 "ï..Class.ID" "city" "dates" "year" "month"
[6] "day" "cit" "Department.College" "Course.Level" "Course.Title"
[11] "Tour." "TILT." "Date.Taught" "Session.Number" "AM.PM"
[16] "Hour.Count" "Library.Instructor" "Other.Library.Instructor" "Duplicate." "Course.Instructor"
[21] "ACRL" "IPED" "Location" "Building.Room" "Distance.Class."
[26] "Location.of.Site.1" "Site.1.Number.of.Students" "Location.of.Site.2" "Site.2.Number.of.Students" "Location.of.Site.3"
[31] "Site.3.Number.of.Students" "Location.of.Site.4" "Site.4.Number.of.Students" "Location.of.Site.5" "Site.5.Number.of.Students" [36] "Location.of.Site.6" "Site.6.Number.of.Students" "Location.of.Site.7" "Site.7.Number.of.Students" "Location.of.Site.8"
[41] "Site.8.Number.of.Students" "Location.of.Site.9" "Site.9.Number.of.Students" "Location.of.Site.10" "Site.10.Number.of.Students"

Maybe I just don't see it but I have a hard time looking at examples with dummy data and translating that to how to load actual data from a csv file The picture shows my output from the dummy data -- exactly what I want. When I use my actual data nothing happens - have I left out a ggplot command to print the plot?

library(readxl)
require(tidyverse)
require(ggplot2)
require(lubridate)
#load data
df <- read_excel("C:/Users/12083/Desktop/librarydata.xlsx")
#plot data
df_example %>%
  ggplot(aes(date,city, color=city))+
  geom_line(aes(linetype=lt))+ #you can use single string for the same linetype for all lines or a vector of strings for each data point
  scale_linetype_identity()+ #this removes the linetype from the legend
  theme_minimal()

df_example

I get this output -- this is exactly right but no plot to accompany it.

city      dates classes       lt
1       Boise 2020-01-01      52    solid
2       Boise 2020-02-01      36    solid
3       Boise 2020-03-01      69    solid
4       Boise 2020-04-01     100    solid
5       Boise 2020-05-01      72    solid
6   Pocatello 2020-01-01      82   dashed
7   Pocatello 2020-02-01      15   dashed
8   Pocatello 2020-03-01      68   dashed
9   Pocatello 2020-04-01      17   dashed
10  Pocatello 2020-05-01      51   dashed
11  Salt Lake 2020-01-01      71   dotted
12  Salt Lake 2020-02-01      65   dotted
13  Salt Lake 2020-03-01      33   dotted
14  Salt Lake 2020-04-01      44   dotted
15  Salt Lake 2020-05-01      16   dotted
16 Twin Falls 2020-01-01       3  dotdash
17 Twin Falls 2020-02-01      30  dotdash
18 Twin Falls 2020-03-01      19  dotdash
19 Twin Falls 2020-04-01      34  dotdash
20 Twin Falls 2020-05-01      69  dotdash
21  Elsewhere 2020-01-01      62 longdash
22  Elsewhere 2020-02-01      14 longdash
23  Elsewhere 2020-03-01      59 longdash
24  Elsewhere 2020-04-01      35 longdash
25  Elsewhere 2020-05-01      91 longdash

dput

structure(list(`Class ID` = c(4438, 4439, 4428, 4437, 4430, 4431, 
4432, 4433, 4434, 4435, 4436, 4427, 4440, 4417, 4414, 4407, 4413, 
4412, 4418, 4410), city = c("Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Meridian", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Idaho Falls"), date = structure(c(1468972800, 1468972800, 
1468886400, 1468800000, 1468454400, 1468454400, 1468368000, 1468368000, 
1468368000, 1468281600, 1468281600, 1466553600, 1466553600, 1461283200, 
1460592000, 1460419200, 1460419200, 1460073600, 1460073600, 1459987200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), year = c(2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016), month = c(7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 4, 4, 4, 4, 4, 4, 4), day = c(20, 
20, 29, 18, 14, 14, 13, 13, 13, 12, 12, 22, 22, 22, 13, 12, 12, 
8, 8, 7), cit = c("Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Meridian", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Idaho Falls"), `Department/College` = c("College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "Library", "Library", "Library", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Education", "Library", "Division of Health Sciecnes", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters"), 
    `Course Level` = c("Lower Division", "Lower Division", "Lower Division", 
    "Lower Division", "Lower Division", "Lower Division", "K-12", 
    "K-12", "K-12", "Lower Division", "Lower Division", "Lower Division", 
    "K-12", "Graduate", "Lower Division", "Lower Division", "Lower Division", 
    "Lower Division", "Lower Division", "Lower Division"), `Course Title` = c("ACAD 1111", 
    "ACAD 1111", "POLS 1110", "ENGL 1123", "ACAD 1111", "ACAD 1111", 
    "Kid University", "Kid University", "Kid University", "ACAD 1111", 
    "ACAD 1111", "EDUC 1110", "Kid University", "Nursing_Orientation", 
    "ENGL 1102", "ENGL 1101", "ENGL 1101", "ENGL 1102", "ENGL 1102", 
    "ENGL 1102"), `Tour?` = c(FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, TRUE, FALSE), `TILT?` = c(FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
    ), `Date Taught` = structure(c(1468972800, 1468972800, 1468886400, 
    1468800000, 1468454400, 1468454400, 1468368000, 1468368000, 
    1468368000, 1468281600, 1468281600, 1466553600, 1466553600, 
    1461283200, 1460592000, 1460419200, 1460419200, 1460073600, 
    1460073600, 1459987200), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), `Session Number` = c("Third Session", "Third Session", 
    "Single Session", NA, "Second Session", "Second Session", 
    "Single Session", "Single Session", "Single Session", "First Session", 
    "First Session", "Single Session", "Single Session", "Single Session", 
    "Single Session", "Single Session", "First Session", "Third Session", 
    "Third Session", "Second Session"), `AM/PM` = c("AM", "PM", 
    "PM", "PM", "AM", "PM", "PM", "PM", "PM", "AM", "PM", "PM", 
    "PM", "AM", "PM", "PM", "AM", "AM", "AM", "AM"), `Hour Count` = c(1.5, 
    1.5, 1, 1.5, 1.5, 1.5, 0.5, 0.5, 1, 1.5, 1.5, 1.5, 1, 1, 
    1.5, 1.5, 1.5, 1, 1, 1.5), 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Cathy Gray", 
    NA, NA, NA, NA, "Monte Asche", "Philip Homan", NA), `Duplicate?` = c(FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE), ACRL = c(0, 0, 7, 5, 0, 0, 7, 7, 7, 22, 9, 
    8, 13, 35, 19, 6, 8, 0, 0, 0), IPED = c(22, 9, 7, 5, 23, 
    9, 7, 7, 7, 22, 9, 8, 13, 35, 19, 6, 8, 19, 19, 22), `Location of Instructor` = c("Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Meridian", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Idaho Falls"), `Building/Room` = c("LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", 
    "Special Collections", "LIBR 212", "LIBR 212", "LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "Meridian", "LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", "CHE 306"
    ), `Distance Class?` = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), `Location of Site 1` = c("Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise"), `Site 1 Number of Students` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Location of Site 2` = c("Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls"), `Site 2 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 3` = c("Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls"), 
    `Site 3 Number of Students` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 4` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `Site 4 Number of Students` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Location of Site 5` = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), `Site 5 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 6` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 6 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 7` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 7 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 8` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 8 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 9` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 9 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 10` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 10 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))
>

Hi Karl, could you please clarify what your question is? The line `df_example = data.frame(city=log),(dates=date),(classes=class) ` is not valid R code. What are you trying to do? — Ian Campbell, Jul 21 '20 at 02:06
I am trying to import a csv file - which has dates, city names, and number of classes - to create a line graph of this data over time. My issue is I don’t understand how to translate my csv data into a format that this graph will recognize. I have dummy data, the graph works, now I need to link my real data from the csv file to graph that — Karl Bridges, Jul 21 '20 at 02:20
df_example %>% ggplot(aes(dates,classes, color=city))+ geom_line(aes(linetype=lt))+ #you can use single string for the same linetype for all lines or a vector of strings for each data point scale_linetype_identity()+ #this removes the linetype from the legend theme_minimal() — Karl Bridges, Jul 21 '20 at 02:20
Unfortunately, no one can help without either 1) the `.csv` file, or 2) the output of `dput(df)` or if your data is very large `dput(df[1:20,])`. You can [edit] your question and paste the output. Please surround the output with three backticks (```) for better formatting. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/) for more info. — Ian Campbell, Jul 21 '20 at 02:28

score 1 · Answer 1 · answered Jul 21 '20 at 13:45

OP, it seems you're having some trouble generally with how to import data from a *.csv and translate that into your desired plot. Since it seems you're able to create a plot, I'll gloss over that part and walk you through an example of a good way to approach importing data, then ensuring you can translate that to your plot.

Importing the .csv file and preparing the data

I will start with a .csv file that I have created using the output you posted of df_example in your question. I exported that data to a *.csv file, and now we can import it:

df <- read.csv('OP_example.csv')

The first step once you import the data is to ensure it "looks right" and to get an idea of the structure. Even when you created the file yourself, it's very important to ensure df looks the way it should. Here, head(), str(), and summary() are your friends.

> head(df)
  X      city      dates classes     lt
1 1     Boise 2020-01-01      52  solid
2 2     Boise 2020-02-01      36  solid
3 3     Boise 2020-03-01      69  solid
4 4     Boise 2020-04-01     100  solid
5 5     Boise 2020-05-01      72  solid
6 6 Pocatello 2020-01-01      82 dashed

> str(df)
'data.frame':   25 obs. of  5 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ city   : chr  "Boise" "Boise" "Boise" "Boise" ...
 $ dates  : chr  "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" ...
 $ classes: int  52 36 69 100 72 82 15 68 17 51 ...
 $ lt     : chr  "solid" "solid" "solid" "solid" ...

You can see that in writing the *.csv file, it created an "X" column that's just the row number. No big deal. We also have everything else looking fine, except that you'll notice that df$dates is read in as a chr, not as a Date or another date-like class. Since I'm going to create a plot using this column, I will need it as a date:

> df$dates <- as.Date(df$dates, format='%Y-%m-%d')

> str(df)
'data.frame':   25 obs. of  5 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ city   : chr  "Boise" "Boise" "Boise" "Boise" ...
 $ dates  : Date, format: "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" ...
 $ classes: int  52 36 69 100 72 82 15 68 17 51 ...
 $ lt     : chr  "solid" "solid" "solid" "solid" ...

Notice that I specify the format= for the date. You'll find information on the nomenclature of % associated with format= within the documentation for the strptime() function. When I run str() again on df, you'll see that df$dates is now a Date class instead of chr.

Plotting

Now for the plot, just make sure that you are reading and plotting the correct dataframe. From your code example... you are plotting using df_example, but reading in df. Not sure if that was a typo.

Your preference appears to be using the pipe %>% command, rather than stating the dataframe within ggplot(), so I'll do that here:

df %>%
  ggplot(aes(x=dates, y=classes, color=city)) +
  geom_line() + geom_point() + theme_bw()

Giving you:

Hope that helps you out. Since we don't have your particular *.csv file and you are not having trouble plotting a particular data frame, the most reasonable place that you're having difficulty is ensuring that when you are reading in your file, the columns and class of your data is in the format you expect. Additionally, please ensure your code is calling to plot the correct data frame.

Looks good...a couple things I noticed - `dates` is included in `ggplot` which matches correctly the data frame column (OP had `date` instead)...also, could add `linetype = lt` to aesthetic which OP had as well, and if dots/points not desired, could just include `group = city` in aesthetic and leave out `geom_point`...depending on preferences...appreciate the detailed answer provided. — Ben, Jul 21 '20 at 14:01
Yes, I would wager a guess that it's something like that which is causing OP some trouble. It's why it's important to focus on what the import of .csv gets you. We do this sort of thing all the time, right - it's almost a ritual now: (1) import file, (2) str, head, summary... (3) alter and re-class stuff, (4) build plots — chemdork123, Jul 21 '20 at 14:31

davy · Answer 2 · 2020-07-21T19:02:32.070

0

Aggregating and Plotting

dplyr allows for easy aggregation of the data. This code will create a new dataset with a count of the number of times each value of the 'Location' variable appears within each unique combination of city and date:

library(dplyr)

filtered_agg <- filtered %>%
  group_by(city, dates, Location) %>%
  summarize(location_sum=n()) 

filtered_agg

For the plot, something like this should give you a result:

Plot = ggplot(filtered_agg, aes(x=dates, y=location_sum, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city)) 

Plot

But it seems like you have one too many dimensions for a simple line graph. If the number of cities (you could also switch city and location_sum) is not too large, facet_wrap would make for a more readable plot:

ggplot(filtered_agg, aes(x=dates, y=location_sum)) + geom_line(aes(linetype=Location, color = Location)) + geom_point(aes(color=Location)) + facet_wrap(~city)

Loading Data

Does the line log = df$city work (it would return an error message if it didn't)? If yes, it looks like you are overthinking it. You can skip the steps involved in creating df_example and just use df directly in your ggplot command:

library(readxl)
library(ggplot2)

df <- read_excel("C:/Users/12083/Desktop/librarydata.xlsx")

df %>%
   ggplot(aes(dates,classes, color=city))+
   geom_line(aes(linetype=lt))+ 
   scale_linetype_identity()+ #this removes the linetype from the legend
   theme_minimal()

If this doesn't work you probably need to adjust the options in the read_excel command.

edited Jul 21 '20 at 19:02

answered Jul 21 '20 at 02:54

davy

33
1
1
8

This doesn't work. Please see my edit to my original question. It's odd -- the df_ example writes out the graph but doesn't actually graph it. I assume I have left out a command??? – Karl Bridges Jul 21 '20 at 03:39
I have spent another hour on this. It will print out the details of the graph but won't actually graph. This is maddening. This is the hardest product I have ever tried to use in my life – Karl Bridges Jul 21 '20 at 04:25
chemdork has some great debugging suggestions above. When you run `str(df)` does your output look similar to theirs? You are not alone, R definitely has a steeper learning curve than many tools. Have you tried the 'Import Dataset' button in RStudio? That will allow you to test different options in the `read_excel` command using buttons on menus rather than code and gives you a nice preview of the data. It also shows you the code used to produce that result, so not a bad way to learn how to import data. – davy Jul 21 '20 at 15:04
The issue I have now is is it won't plot Plot = ggplot(filtered, aes(x=dates, y=location, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city)) It's giving me an error about Don't know how to automatically pick scale for object of type function. Defaulting to continuous. Error: Aesthetics must be valid data columns. Problematic aesthetic(s): y = location. Did you mistype the name of a data column or forget to add after_stat()? – Karl Bridges Jul 21 '20 at 15:36
Looks like the problem is that you have loaded the `dplyr` package, which has a function called `location`. So `ggplot` thinks you are referencing the function instead of the dataframe column called 'location'. I think all you need to is add backticks before and after the word location: ````Plot = ggplot(filtered, aes(x=dates, y=`location`, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city))```` – davy Jul 21 '20 at 16:13
Nope. That makes no difference. I assume by backticks you mean the character under the tilde on the key to the left of number 1 – Karl Bridges Jul 21 '20 at 16:22
Yes the backtick is on the same key is the tilde. If that isn't working it means that there isn't a column called location in the filtered dataframe. You can use this command to see which columns are present and check for typos: `colnames(filtered)`. If that doesn't lead to an obvious fix please update your original question to show how you are creating the filtered dataframe used in your `ggplot` command - it could be that the location column is not being carried over to the new dataframe. – davy Jul 21 '20 at 16:50
I have made the update as requested. This is truly frustrating. The whole thing is just a summary of teaching sessions by location by date to be expressed in a line graph. I don't understand why this is so difficult. – Karl Bridges Jul 21 '20 at 17:04
I think the problem is that I need to do a count of the categorical variable location - I need to have the categorical variable location summed by city so that I can plot it by date but I don’t see how to do this. – Karl Bridges Jul 21 '20 at 17:55
I just added example code for creating that sum to the top of my answer. – davy Jul 21 '20 at 18:25
Thanks. It doesn't appear to do any good -- I still have no output when I plot. Sorry to be stupid, but I substituted filtered_agg for filtered with no change in the output. I get the idea - I filter the rows I want, summarize them, and then plot them Should I remove the group in the plot since it is already summarized??...Plot = ggplot(filtered_agg, aes(x=dates, y=Location, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city)) Plot – Karl Bridges Jul 21 '20 at 18:50
No worries, it is a very finicky language. Updated my answering again with some options for the plot. – davy Jul 21 '20 at 19:03
Thanks. I have a book on r graphics coming in the mail. Thanks for the assist - I’m a 60 something classics librarian stuck in quarantine so my bosses thought me learning to program would be “easy” and “fun”. I am figuring out R with YouTube Dr. Google and a few helpful people like you. Never thought this would be part of my job. Greek verbs are more difficult but not much. – Karl Bridges Jul 21 '20 at 19:07

Mapping sample data to actual csv data

2 Answers2

Importing the .csv file and preparing the data

Plotting

Aggregating and Plotting

Loading Data

Linked