0

I exported a file from Tableau that I want to read in R. But, when I load the csv, it reports an Error, but checking the file, its created in this way:

Año Categoría Migratoria    Centro Regional Ciudad Hospedaje    Colombiano Extranjero   Departamento Hospedaje  Departamento1   Entrada Salida  Entrada Salida (copia)  Meses1  Motivo Viaje    País Destino Procedencia    País Nacionalidad   Puesto Migratorio   Rango Edad  Region Destino  Region Nacionalidad Sexo1   Tipo Transporte Cantidad de filas (agregadas)   aFemenino   Masculino   Number of Records
2022    Sin Especificar Antioquia   Sabaneta    Colombianos Antioquia   Antioquía   Entradas    Entradas    Septiembre  Residente   República Dominicana    Colombia    Aeropuerto José María Córdova de Rionegro   0-17    América Central y el Caribe América del Sur Femenino    Aéreo   2   2   0   1
2022    Sin Especificar Antioquia   Rionegro    Colombianos Antioquia   Antioquía   Entradas    Entradas    Septiembre  Turismo México  Colombia    Aeropuerto José María Córdova de Rionegro   0-17    América Central y el Caribe América del Sur Femenino    Aéreo   1   1   0   1
2022    Sin Especificar Antioquia   Envigado    Colombianos Antioquia   Antioquía   Entradas    Entradas    Septiembre  Turismo República Dominicana    Colombia    Aeropuerto José María Córdova de Rionegro   0-17    América Central y el Caribe América del Sur Femenino    Aéreo   3   3   0   1
2022    Sin Especificar Antioquia   Barranquilla    Colombianos Atlántico   Antioquía   Entradas    Entradas    Septiembre  Residente   República Dominicana    

Excel reads it flawlessly, but R give me this error when I run df <- read.csv(file = pathfile, header = TRUE, sep = " ", dec = ".", encoding = "ANSI", stringsAsFactors = FALSE)

Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<ff><fe>A'
Además: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 3 appears to contain embedded nulls
4: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 4 appears to contain embedded nulls
5: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 5 appears to contain embedded nulls
lasagna
  • 135
  • 1
  • 10
  • If someone its interested in see the whole file, pls check this at https://analisissocial.s3.amazonaws.com/bbb.csv – lasagna Nov 06 '22 at 16:07
  • 1
    It is choking on the enyee, of Ano column, or so it appears. Fish around for proper encoding. Just messing around with the hex 'A' above. – Chris Nov 06 '22 at 16:12
  • 1
    Some approaches to investigating [file encoding](https://stackoverflow.com/questions/4806823/how-to-detect-the-right-encoding-for-read-csv), a fun kind of problem... – Chris Nov 06 '22 at 16:19
  • If your file is delimited by a space (or `\t` in this case), then it is not a CSV. Try `read.delim()` or `readr::read_tsv()`. If you have another file encoding problem, you still might have to account for that. –  Nov 06 '22 at 16:29
  • 1
    `sep = ''` works for single or any amount of whitespace including tabs, new lines, and carriage returns. your original call has a length two `sep` so that will throw an error as well `sep = ' '` will work `sep = ' '` will not – rawr Nov 06 '22 at 16:43
  • 1
    Right, but it's still not a CSV and it creates a confusing workflow. It does work, of course, but better to use the correct function. Both of them simply forward to `read.table()`. But why not use the one with the default you want, instead of changing it? –  Nov 06 '22 at 16:46

1 Answers1

0

Well, I release that the data separator with tabs, it's specified with \t, and I changed the encoding with the recommendation of @Chris . This line worked to load it:

o<-read.csv(pathfile, header=T, sep="\t", fileEncoding="UTF-16LE")
lasagna
  • 135
  • 1
  • 10