1

i have a very stupid problem that made me loose a few hours, hence I though about posting it here.

my data looks something like this

df<- data.frame("Reporter" = c("USA", "USA", "USA", "USA", 
"Africa","Africa", "Africa","Africa"), 
"Partner" = c("Africa", "Africa", "EU", "EU", 
"USA", "USA", "EU", "EU"),
"Year" = c(1970, 1980, 1970, 1980, 1970, 1980, 
1970, 1980), 
"Flow" = c("001", "00", "1", "112", "0", "2", "23", "TOT"),
"Val" = runif(8, min=0, max=100), stringsAsFactors 
= FALSE)     

Flow is a character variable that include character and numbers. These are identifiers for the variable "Val"

class(df$Flow)

I would like to remove the rows that have letters in "Flow" while keeping the rest.

df <- df %>% filter(Flow != "TOT")

this approach works as I expect. The problem appears later once I remove the letters from flow, and save my data csv.

 write.csv(df, "df.csv")

Once I re-upload my data, this has been radically transformed. All the 0s in front of number have been lost as the data has been stored as numeric

 df2<- import("df.csv")

I have also tried write.csv2(df, "df.csv") but the result does not change. If I save in dta instead, the data work once re-uploaded, but I would like to save in csv

Does anyone know what I am doing wrong?

Alex
  • 1,207
  • 9
  • 25
  • What is the reasoning behind using `import()` here? I see the same behaviour using `read.csv()` and as far as I can see the easiest solution is to define your `colClasses` within the likes of `read.csv()` so that it imports this column as character. The important thing here is that you are not doing anything wrong when saving the data but instead the problem is occurring when you read the data again. – Tom Haddow May 08 '19 at 15:26
  • Possible duplicate of [R write dataframe column to csv having leading zeroes](https://stackoverflow.com/questions/28675279/r-write-dataframe-column-to-csv-having-leading-zeroes), although the answers to @Alessandro's question are much more efficient here. – TheSciGuy May 08 '19 at 15:32

2 Answers2

3

The problem you are having is not how the csv is being written, but how it is being read back in. If you look at the file in a text editor, you should see the leading zeros still there.

In base R, you can specify the class of the columns, which will prevent the leading zeros from being dropped:

> (df2 <- read.csv('df.csv', colClasses = c(Flow = 'character')))
  X Reporter Partner Year Flow       Val
1 1      USA  Africa 1970  001 87.582979
2 2      USA  Africa 1980   00  1.908992
3 3      USA      EU 1970    1 41.421509
4 4      USA      EU 1980  112 59.110781
5 5   Africa     USA 1970    0 27.277206
6 6   Africa     USA 1980    2 29.184184
7 7   Africa      EU 1970   23 37.417494
C. Braun
  • 5,061
  • 19
  • 47
  • thanks this could work, the issue is that when I use the original dataset I get an error message **'Error in read_csv("@@experiment.csv", colClasses = c('Commodity Description' = "character")) : unused argument (colClasses = c('Commodity Description' = "character"))** do you know to what it could be due? – Alex May 08 '19 at 15:50
  • 1
    @Alessandro The function `read_csv` is from the tidyverse package `readr`, and has different parameter names than `read.csv`. Instead of `colClasses`, you probably want `col_types`. – C. Braun May 08 '19 at 16:37
  • yes in the end I solved the issue using: `read_csv("df.csv", col_types = cols(.default = col_guess(), "Flow" = "c"))` – Alex May 08 '19 at 17:43
3

You can use read_csv() from readr. It has col_types parameter, but even with default value it makes what you want.

df <- read.csv("df.csv")
class(df$Flow)
# [1] "integer"

df <- read_csv("df.csv")
class(df$Flow)
# [1] "character"
Yuriy Barvinchenko
  • 1,465
  • 1
  • 12
  • 17