0

I am quite new to R, apologies in advance for the ignorance.

I have loaded this csv file using read_csv to "df" dataframe and looking to plot, and properly assigned row and column names.

        a     b      c    d    ...
x    0.5%  2.5%  16.3%  6.3%   ...
y    1.5%  0.5%  1.3%  6.3%   ...
z    0.5%  8.5%  16.3%  1.3%   ...
.
.

I am having problems with converting "52.26%" values to double for plotting.

I could convert them column by column using

aNum = as.numeric(sub("%", "", df$a))

however it doesn't work for the whole frame.

allNum= as.numeric(sub("%", "", df))

Any help is much appreciated !

sai saran
  • 737
  • 9
  • 32
Zuko
  • 1
  • 1
  • 1
    You can't pass an entire dataset to `sub` or `as.numeric`. Instead you need to loop over the columns of `df`. You could use `lapply` to loop through these columns. Specifically `lapply(df, function(x) as.numeric(sub("%", "", x)))` should work. What `lapply` does is basically perform our function operation on each column. Note within our function the column is referred to as `x`. This might be a helpful overview of how `lapply` works: https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family?noredirect=1&lq=1 – Mike H. Dec 07 '18 at 05:33

1 Answers1

1

Like Mike H. said, as.numeric will take in a vector (which has a single type, like string or int or whatever) but it won't take in a data frame (which could have variables of multiple different types inside of it). The library tidyverse has some really nice tools for efficiently going through a data frame, and it also gives you a function for dealing with percents.

library(tidyverse)
df %>%
  mutate_all(.funs = readr::parse_number)
Erin
  • 386
  • 1
  • 7