0

I have a csv file with a variable (id). In excel when I check the format of the cells, some cells are type general and some are scientific:

#            id
# 1 ge189839898     #general format cell in excel
# 2   we7267178     #general format cell in excel
# 3     2.8E+12     #scientific format cell in excel

When I read the file into R using read_csv, it thinks that the column is character (which it is and what I want) but it means 2.8E+12 is also a character.

options(digits = 22, scipen = 9999)
library(tidyverse)
dfcsv <- read_csv("file.csv")
#where dfcsv looks like:
dfcsv <- data.frame(id = c("ge189839898",
                        "we7267178",
                        "2.8E+12"))
dfcsv
#            id
# 1 ge189839898     
# 2   we7267178    
# 3     2.8E+12  

Is there a way to automatically read in the csv so that variables with mixed types are correctly identified so it would be return a character variable but scientific notation is expanded:

#               id
# 1    ge189839898
# 2      we7267178
# 3  2800000000000

I don't think guess_max is what I am after here. I would also prefer not to use grep/sprintf type solutions (if possible) as I think that is trying to fix a problem I shouldn't have? I found these problematic ids by chance so I would like an automated way of doing this at the reading in stage.

The cleanest solution is probably to go in to the csv file and make a conversion there but I want to do it through R.

Thanks

user63230
  • 4,095
  • 21
  • 43
  • 2
    Columns in R cannot have mixed types. There's no way to have `readr` do what you want during import. You need to clean up the data after the fact. It would be better if you found a way to export the data from Excel where it didn't incorrectly reformat the data. Are you sure those ID values always are just filled with zeros? When switching to scientific notation I'd guess you are actually using the last few digits of precision would could be very important for IDs – MrFlick Mar 30 '21 at 16:56
  • You can find numeric values in your ID columns with this existing answer: https://stackoverflow.com/questions/39822643/extract-numeric-elements-from-a-vector – MrFlick Mar 30 '21 at 16:56
  • 1
    @MrFlick you are of course right with your first comment, something I naively overlooked. In this case is my only solution to go back to excel? The link wont extract the numbers with `e` correctly as they have already been converted to character. – user63230 Mar 30 '21 at 17:07
  • 1
    Well, If you don't go back to Excel I'm pretty sure you're going to lose information. If you're OK with that, then you can proceed in R. In the end the column is going to have to be character. – MrFlick Mar 30 '21 at 17:12

1 Answers1

2
id <- c("ge189839898", "we7267178", "2.8E+12")
func <- function(x) {
  poss_num <- suppressWarnings(as.numeric(x))
  isna <- is.na(poss_num)
  x[!isna] <- format(poss_num[!isna], scientific = FALSE)
  x
}
func(id)
# [1] "ge189839898"   "we7267178"     "2800000000000"
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • this answers my question so thank you but I have naively overlooked @MrFlick 1st comment above so I asked the wrong question in a way. My `ids` are not just filled with zeros – user63230 Mar 30 '21 at 17:00
  • I really don't understand what you're going for, then. You specifically stated *"return a character variable but scientific notation is expanded"*, meaning you were asking for the updated value to *still be strings*, which is what this does. I inferred that since you expressly wanted strings, you knew (or did not care) about the no-mixed-types thing. Over to you to figure out what you need here. – r2evans Mar 30 '21 at 17:09
  • going back to excel is my only option here i think, thanks for the solution which I will use again! – user63230 Mar 30 '21 at 17:13