0

I have a dataset that in a wide format represent lenders' characters for a banking credit system. I want to make a scatter plot using ggplot where colours represent the purpose of the credit. My table looks like this: where 1 means the purpose of the credit.

lending_duration lending_amount Car Furniture TV/RADIO House
1 month 2000 0 1 0 0
16 months 15600 1 0 0 0
4 month 13094 0 0 0 1
etc...

I tried: ggplot(Data, aes(x = DURATION, y = AMOUNT))+ geom_point(aes(color = c(Car, Furniture, 'TV/Ratio', House))+ scale_color_viridis_c() Not working out. Another question is how can I escape the / in the variable name, for example here TV/(OR)Radio, I try to use '' to escape the / in the variables but seems not working out. Can someone help me here? Much appreciated!

Skylar J
  • 1
  • 1
  • 4
    You need to reshape your data from wide to long. See https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format. Then you map to color, see for ggplot2 specific answers here: https://stackoverflow.com/questions/3777174/plotting-two-variables-as-lines-using-ggplot2-on-the-same-graph – Axeman Nov 01 '21 at 16:39

1 Answers1

0

Here's a solution for both questions. You can rename columns containing special characters by simply putting them in backticks:

library(tidyverse)
library(RColorBrewer)

# your sample data in a df
df <- tibble(lending_duration = c("1 month", "16 month", "4 month"), 
       lending_amount = c(2000, 15600, 13094), 
       Car = c(0, 1 ,0), 
       furniture = c(1,0,0), 
       `TV/Radio` = c(0, 0, 0),
       House = c(0, 0, 1)) 

df %>%  rename(TV_or_Radio = `TV/Radio`) %>% 
  pivot_longer(cols = c(Car, furniture, TV_or_Radio, House)) %>% 
  filter(value != 0) %>%
  # split string in lending_duration and use only first part converted to numeric, 
  # allows to plot durations in increasing order
  mutate(lending_duration = as.numeric(str_split(lending_duration, " ") %>% map_chr(., 1))) %>% 
  ggplot(aes(lending_duration, lending_amount, color = name)) +
  geom_point(size = 3) +
  scale_color_viridis_d() +
  xlab("lending_duration in month")

enter image description here

mgrund
  • 1,415
  • 8
  • 10
  • Totally agree with you @tjebo, however, if I understood the question correctly, the problem was how to get rid of the ``TV/Radio`` column name. – mgrund Nov 03 '21 at 14:37
  • 1
    yes, my apologies, I misudnerstood that! will delete my comment – tjebo Nov 03 '21 at 14:51