0

Converting factor to integer from a .csv using RStudio.

Hi, I know this question has been asked frequently but I've been trying to wrap my head around things for an hour with no success.

In my .csv file 'Weighted.average' is a calculation of Weighted.count/count (before conversion), but when I use the file in R it is a factor, despite being completely numeric (with decimal points).

I'm aiming to aggregate the data using Weighted.average's numeric values. But as it is still considered a factor it doesn't work. I'm newish to R so I'm having trouble converting other examples to my own.

Thanks

RENA <- read.csv('RENA.csv')
RENAVG  <-     aggregate(Weighted.average~Diet+DGRP.Line, data = RENA, FUN = sum) 
ggplot(RENAVG, aes(x=DGRP.Line, y=Weighted.average, colour=Diet)) +
  geom_point()

Expected to form a dot plot using Weighted.average, error

Error in Summary.factor(c(3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : ‘sum’ not meaningful for factors

occurs. I know it's due to it not being read as an integer, but I'm lost at how to convert.

Thanks

Output from dput

> dput(head(RENA))
structure(list(DGRP.Line = structure(c(19L, 19L, 19L, 19L, 20L, 
20L), .Label = c("105a", "105b", "348", "354", "362a", "362b", 
"391a", "391b", "392", "397", "405", "486a", "486b", "712", "721", 
"737", "757a", "757b", "853", "879"), class = "factor"), Diet = structure(c(1L, 
1L, 2L, 2L, 1L, 1L), .Label = c("Control", "Rena"), class = "factor"), 
    Sex = structure(c(2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Female", 
    "Male"), class = "factor"), Count = c(0L, 0L, 0L, 0L, 1L, 
    0L), Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("16/07/2019", 
    "17/07/2019", "18/07/2019", "19/07/2019", "20/07/2019", "21/07/2019", 
    "22/07/2019"), class = "factor"), Day = c(1L, 1L, 1L, 1L, 
    1L, 1L), Weighted.count = c(0L, 0L, 0L, 0L, 1L, 0L), Weighted.average = structure(c(60L, 
    59L, 52L, 63L, 44L, 36L), .Label = c("", "#DIV/0!", "1.8", 
    "1.818181818", "2", "2.275862069", "2.282608696", "2.478873239", 
    "2.635135135", "2.705882353", "2.824561404", "2.903614458", 
    "2.911392405", "2.917525773", "3", "3.034090909", "3.038461538", 
    "3.083333333", "3.119402985", "3.125", "3.154929577", "3.175438596", 
    "3.1875", "3.220338983", "3.254237288", "3.263157895", "3.314606742", 
    "3.341463415", "3.35", "3.435483871", "3.5", "3.6", "3.606557377", 
    "3.666666667", "3.6875", "3.694214876", "3.797619048", "3.813953488", 
    "3.833333333", "3.875", "3.909090909", "3.916666667", "4.045454545", 
    "4.047169811", "4.111111111", "4.333333333", "4.40625", "4.444444444", 
    "4.529411765", "4.617021277", "4.620689655", "4.666666667", 
    "4.714285714", "4.732283465", "4.821428571", "4.823529412", 
    "4.846153846", "4.851851852", "4.855263158", "4.884615385", 
    "4.956521739", "5", "5.115384615", "5.230769231", "5.343283582", 
    "5.45", "5.464285714", "5.484848485", "5.538461538", "5.551724138", 
    "5.970588235", "6", "6.2"), class = "factor")), row.names = c(NA, 
6L), class = "data.frame")
Kodewings
  • 29
  • 7
  • Whichever columns in `RENA` are factors convert to numbers with `as.integer`? It would be useful to see some representative data, could you edit your question and add the output from `dput(head(RENA))`? – r2evans Jul 23 '19 at 14:56
  • I've added the dput information – Kodewings Jul 23 '19 at 15:00
  • Thanks, removed the #DIV/0 manually. I'll know what to look out for next time. – Kodewings Jul 23 '19 at 15:17

1 Answers1

-1

Just modify your first line (the read.csv) to specify the nature of each variable during the import.

Vincent Chalmel
  • 590
  • 1
  • 6
  • 28
  • How would I do that? – Kodewings Jul 23 '19 at 14:59
  • @Megascops Check the doc for read.csv use the "colclasses" parameter A vector of classes to be assumed for the columns. Possible values are NA (the default, when type.convert is used), "NULL" (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "Date" or "POSIXct". Otherwise there needs to be an as method (from package methods) for conversion from "character" to the specified formal class. Note that colClasses is specified per column (not per variable) and so includes the column of row names (if any). – Vincent Chalmel Jul 23 '19 at 15:05