I have a text file with UTF-16 encoding which looks like
"Day;""name"";""age"";""money"""
"2021-04-01;""mona"";""27"";"""""
"2021-04-02;""Jan"";""29"";"""""
"2021-04-03;""Shan"";""25"";"""""
I want to convert it into pandas dataframe, based on one another answer I tried this
# Just manually make the columns
cols = ['Day', 'name', 'age', 'money']
df = pd.DataFrame(columns=cols)
#df
#Empty DataFrame
#Columns: [Day, name, age, money]
#Index: []
# Read the CSV in to a list
import csv
data = []
with open('test.csv', 'r') as f:
reader = csv.reader(f)
# Skip the header since we already have column names
next(reader)
for row in reader:
# List comprehension to edit the strings
edited = [s.replace('"', '').split(';') for s in row][0]
data.append(edited)
# Use our column names to make a new dataframe
df2 = df.append(pd.DataFrame(data, columns=cols))
but this doesn't work and we get error in line (The idea was to trim all the " from the .txt file and then read it in a dataframe with a delimiter ';')
edited = [s.replace('"', '').split(';') for s in row][0]
----list index out of range---
The reason is that if I print the rows from above code we get
['2021-04-01;"pqr";"15";""'] ['2021-04-02;"pqrs";"152";""'] ['2021-04-03;"pqrt";"152";""']
so basically all the columns are now inside just 1 column and above code to split isn't working.
my requested solution should look something like
Day name age money
0 2021-04-01 mona 27 NaN
1 2021-04-02 Jan 29 NaN
2 2021-04-03 Shan 25 NaN
Can anyone help me please, Thanks in advance.