I am trying to analyse WhatsApp by putting it into a Pandas dataframe, however it is only being read as a single column when I do enter it. What do I need to do to correct my error? I believe my error is due to how it needs to be formatted
I have tried to read it and then use Pandas to make it into columns, but because of how it is read, I believe it only sees one column. I have also tried to use pd.read_csv and that method does not yield the correct result either and the sep method too
The information from whatsapp is presented as follows in notebook:
[01/09/2017, 13:51:27] name1: abc
[02/09/2017, 13:51:28] name2: def
[03/09/2017, 13:51:29] name3: ghi
[04/09/2017, 13:51:30] name4: jkl
[05/09/2017, 13:51:31] name5: mno
[06/09/2017, 13:51:32] name6: pqr
The python code is as folows:
enter code here
import re
import sys
import pandas as pd
pd.set_option('display.max_rows', 500)
def read_history1(file):
chat = open(file, 'r', encoding="utf8")
#get all which exist in this format
messages = re.findall('\d+/\d+/\d+, \d+:\d+:\d+\W .*: .*', chat.read())
print(messages)
chat.close()
#make messages into a database
history = pd.DataFrame(messages,columns=['Date','Time', 'Name',
'Message'])
print(history)
return history
#the encoding is added because of the way the file is written
#https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-
codec-cant-decode-byte-x-in-position-y-character/9233174
#i tried using sep, but it is not ideal for this data
def read_history2(file):
messages = pd.read_csv(file)
messages.columns = ['a','b]
print(messages.head())
return
filename = "AFC_Test.txt"
read_history2(filename)
The two methods I have tried are above. I expect 4 coluumns. The date, time, name and the message for each row