0

I have hours trying out code posted here on SO. They open the files successfully but when they are saved, they are still in Unicode Text format and Python pandas csv_reader won't read them. Here is my latest and greatest attempt. My files are in subfolder "QCfiles":

Sub LoopExample()

Dim MyFolder As String, MyFile As String

With Application.FileDialog(msoFileDialogFolderPicker)
    .AllowMultiSelect = False
    .Show
    MyFolder = .SelectedItems(1)
    Err.Clear
End With

Application.ScreenUpdating = False

MyFile = Dir(MyFolder & "\", vbReadOnly)

Do While MyFile <> ""
    Workbooks.Open FileName:=MyFolder & "\" & MyFile, UpdateLinks:=False
    MsgBox Cells(1, 1)
    ActiveWorkbook.SaveAs FileName:=MyFile, FileFormat:=xlCSV
    Workbooks(MyFile).Close savechanges:=True
    MyFile = Dir
Loop

Application.ScreenUpdating = True

End Sub
BigBen
  • 46,229
  • 7
  • 24
  • 40
  • 1
    "Regular" csv files don't care what the encoding is. What encoding does pandas expect them to be in? – Comintern Nov 28 '18 at 17:00
  • You may use simple `ReadTextFile` and `WriteTextFile` functions from [here](https://stackoverflow.com/a/43265363/2165759) to read the content of the files in Unicode and then save them in ASCII. – omegastripes Nov 28 '18 at 17:13

2 Answers2

0

Before you modify the CSV (via VBA), try passing an argument to the encoding parameter of pandas.read_csv().

As I understand it, encoding defaults to your system's settings (which might be 'cp1252' if your regional settings are some variant of English).

To override this implicit behaviour, you can try passing encoding='utf-8' (or encoding='utf-8-bom' might be necessary instead), which should then allow pandas to decode/work with Unicode characters.

Say you have imported pandas under the namespace pd and your file is at C:\some_folder\ok.csv (on Windows):

import pandas as pd

df = pd.read_csv('C:\\some_folder\\ok.csv', encoding='utf-8')

print(df.head())

Worth trying in my opinion, as it could be a quick fix.

(Other option might be to import some Python library which can attempt to infer the file's encoding, then use that particular encoding, but this is obviously more work.

Also, you may lose information/bytes when forcefully coercing Unicode to ASCII, depending on whether there are any non-ASCII characters.)

chillin
  • 4,391
  • 1
  • 8
  • 8
  • I found a solution. Thanks for the reply. I succeeded using the 'data.table' package in R. It imports the files as tibbles which are easy to save as regular csv files. I'm a Python fan esp Pandas but it falls down on reading irregular data types. I tried the encoding = 'utf-8' argument but it did not work. R has much better data reading packages in my opinion. – user3779318 Nov 29 '18 at 17:10
0

'This works in R'. I will switch back to Python/Pandas for further work now. R is excellent at reading messy data'

install.packages('data.table') library(data.table)

fileNames <- Sys.glob("C:/Users/fkean/TOS_csv/QCfiles/.")

for (f in fileNames) {

sample <- read_csv(f,skip=2) write_csv(sample,f,append=FALSE)

}