2

I often face csv files, which were saved with a German locale and are therefore not properly comma-separated, but rather are separated with a semi-colon. This is of course easily solvable by defining the separator. But vroom in contrast to for example fread does not offer the possibility to also define the decimal separator. Therefore, numerical values with a , as decimal separator are imported as characters or wrongly without any decimal separator and thus really large numbers. Is there a way to directly define the decimal separator similar to the way it works in fread?

library(vroom)
library(data.table)
   
df <- data.table(row.num = 1:10
                 , V1 = rnorm(10,10,5)
                 , V2 = rnorm(10,100,30))

fwrite(df, file = "vroom_test.csv", sep = ";", dec = ",")

fread(input = "vroom_test.csv", sep = ";", dec = ",")

vroom(file = "vroom_test.csv", delim = ";")
# definition of custom locale does allow that
vroom(file = "vroom_test.csv", delim = ";", locale = locale(grouping_mark = ".", decimal_mark = ",", encoding = "UTF-8"))
hannes101
  • 2,410
  • 1
  • 17
  • 40
  • 1
    I searched the `?vroom` help page for "decimal" and found the `locale` argument, which says *"you can use `locale()` to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names."* Have you tried that? – Gregor Thomas Feb 02 '22 at 15:19
  • Ok, added the solution, not sure if `grouping_mark` and `decimal_mark` need to be defined both, but better safe than sorry. Thanks – hannes101 Feb 02 '22 at 15:32
  • 2
    You should post the solution as an answer, not edit it into your question! [Answering your own question is strongly encouraged](https://stackoverflow.com/help/self-answer). – Gregor Thomas Feb 02 '22 at 15:40
  • As for whether grouping_mark and decimal_mark both need to be included, it will depend on your input file whether both are present. As you say, better safe than sorry. – Gregor Thomas Feb 02 '22 at 15:41

1 Answers1

1

As already mentioned in the comments, the solution is rather straight-forward and the only thing necessary is to include the locale() option to the vroom call. Possible options to the locale option can be found in its documentation.

vroom(file = "vroom_test.csv", delim = ";", locale = locale(grouping_mark = ".", decimal_mark = ",", encoding = "UTF-8"))
hannes101
  • 2,410
  • 1
  • 17
  • 40