0

I would like to create dot-plot for my data set. I know how to create a normal dot-plot for treatment comparisons or similar data sets using ggplot. I have the following data. And would like to create a dot-plot with three different colors. Please suggest me how to prepare data for this dot-plot. If I have a single data point in NP and P, it is easy to plot as I already worked with similar data but not getting any idea with this kind of data. I can use ggplot module from R and can be done.

The variable W has always single data point while NP and P has different data points i.e. some time one in NP and some times three and same with variable P,as I shown in the table.

Here is the screen shot for my data.Sample data separated by tab

something similar to this image Sorry for my language

I agree my data is mess. I googled and did some coding to get the plot. I used tidyverse and dplyr packages to attain the plot but again there is a problem with y-axis. Y-axis is very clumsy. I used this following code

d <- read.table("Data1.txt", header = TRUE, sep = "\t", stringsAsFactors = NA)
df <- data.frame(d)

df <- df %>%
 mutate(across(everything(), as.character)) %>%
 pivot_longer(!ID, names_to="colid", values_to="val") %>%
 separate_rows(val, sep="\t", convert=TRUE) %>%
 mutate(ID=as_factor(ID)

Then I plot the graph with ggplot

ggplot(df, aes(x=ID, y=val, color=colid))+geom_point(size=1.5) +theme(axis.text.x = element_text(angle = 90))

The output is this. I tried to adjust Y-axis with ylim and scale_y_discrete() but nothing worked. Please suggest a way to rectify it.

output

Thulasi R
  • 331
  • 4
  • 13
  • It sounds like you need to restructure your data. In a 2D scatter plot, each point needs to have exactly 1 X and 1 Y value. If you can share your data using `dput(data)` we might be able to help more. – Dan Adams Mar 15 '21 at 14:37

1 Answers1

1

This contains many necessary steps for data cleaning, as suggested by user Dan Adams in the comment. This was kind of fun, and it helped me procrastinate my own thesis.

I am using a function from a very famous thread which offers a way to splits columns when the number of resulting columns is unknown.

P.S. The way you shared the data was less than ideal.

#your data is unreadable without this awesome package
# devtools::install_github("alistaire47/read.so") 
library(tidyverse)
df <- read.so::read_md("|ID| |W| |NP| |P|

|:-:| |:-:| |:-:| |:-:|

|1| |4.161| |1.3,1.5| |1.5,2.8|

|2| |0.891| |1.33,1.8,1.79| |1.6|

|3| |7.91| |4.3| |0.899,1.43,0.128|

|40| |2.1| |1.4,0.99,7.9,0.32| |0.6,0.5,1.57|") %>%select(-starts_with("x")) 
#> Warning: Missing column names filled in: 'X2' [2], 'X4' [4], 'X6' [6]

# from this thread https://stackoverflow.com/a/47060452/7941188
split_into_multiple <- function(column, pattern = ", ", into_prefix){
  cols <- str_split_fixed(column, pattern, n = Inf)
  cols[which(cols == "")] <- NA
  cols <- as.tibble(cols)
  m <- dim(cols)[2]
  names(cols) <- paste(into_prefix, 1:m, sep = "_")
  cols
}
# apply this over the columns of interest
ls_cols <- lapply(c("NP", "P"), function(x) split_into_multiple(df$NP, pattern = ",", x))

# bind it to the single columns of the old data frame
# convert character columns to numeric
# apply pivot longer twice (there might be more direct options, but I won't be 
# bothered to do too much here)
df_new <- 
  bind_cols(df[c("ID", "W")], ls_cols) %>%
  pivot_longer(cols = c(-ID,-W), names_sep = "_", names_to = c(".value", "value")) %>%
  mutate(across(c(P, NP), as.numeric)) %>%
  select(-value) %>%
  pivot_longer(W:P, names_to = c("var"), values_to =  "value")

# The new tidy data can easily be plotted 
ggplot(df_new, aes(ID, value, color = var)) + 
  geom_point()
#> Warning: Removed 12 rows containing missing values (geom_point).

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • I am sorry for the confusion with my data, here I will tag a screen shot of my data. I was Unable to input the tabular data properly. Sorry for mess – Thulasi R Mar 15 '21 at 17:11
  • @ThulasiR actually the markdown table was better for sharing - a screenshot is really the worst way to share data. check https://stackoverflow.com/help/minimal-reproducible-example – tjebo Mar 15 '21 at 17:27