I'm trying to color this correlation scatterplot with 3 different colors: black for normal cell lines, red for cancer cell lines, and blue for tumors. I set up my ggplot and labeled the points, but because each cell line being compared has a specific name, I cannot group the data with dplyr or Excel. I tried dividing my data into additional data frames and using geom_point to color specifically those data frames, but that is not working. I'd appreciate any help I can get.
#create master frame
df <- my_file
names <- rownames(df)
df <- as.data.frame(df)
class(df)
normals <- df[1:21]
cell_lines <- df[22:60]
tumors <- df[61:219]
row.names(df) <- c("PTPRO", "PPP1R14C", "IRF8", "TSPAN8","NKD1")
#create new data frame for 2 specific sets
row_to_keep <- c(TRUE,TRUE,FALSE,FALSE,FALSE)
df1 <- df[row_to_keep,]
# create scatter plot
df1=data.frame(t(df)) #switch rows and columns
ggplot(df1,aes(x=PTPRO,y=PPP1R14C))+
geom_point() + #scatter plot
coord_trans(y='log10',x='log10') + #logarithmic scale
geom_text( #label names
label=rownames(df1),
nudge_x=0.25,nudge_y=0.25,
check_overlap=T,
)
Here is my current graph
I want it to look something like this reproducible example described here at https://r-charts.com/correlation/scatter-plot-group-ggplot2/