0

I have a very unorganized dataset located in a text file say file.txt

The sample looks something like so

  TYPE  Invoice           C          AC      DATE      TIME  Total Invoice   Qty1           ITEMVG By          Total 3,000.00
                                                                                                       Piece           Item
                    5696                         01/03/2018  09:21       32,501.35   1   Golden Plate ÞÔÞæÇä ÈÞÑ      6,517.52
                                                                                     1   áÈä ÑæÇÈí ÊÚäÇíá  2 ßÛ       4,261.45
                                                                                     1   Magic chef pop corn 907g     3,509.43
                                                                                     1   áÈäÉ ÊÚäÇíá ÔÝÇÝÉ 1 ßíáæ     9,525.60
                                                                                     1   KHOURY UHT 1 L               2,506.74
                                                                                     1   ÎÈÒ ÔãÓíä ÕÛíÑ               1,002.69
                                                                                     2   Almera 200Tiss               2,506.74
                                                                                   1.55  VG Potato                    1,550.17
                                                                                   0.41  VG Eggplant                    619.67
                                                                                     1   Delivery Charge                501.35

                    5697                         01/03/2018  09:31       15,751.35  0.5  Halloum 1K.                  4,476.03
                                                                                   0.59  Cheese double Cream          3,253.75
                                                                                     3   ãæáÇä 쾄 ÎÈÒ æÓØ 32         3,760.11
                                                                                     3   ãæáÇä 쾄 ÎÈÒ æÓØ 32         3,760.11
                                                                                     1   Delivery Charge                501.35

I want to import it into a data frame pandas using multi-index. Can someone help me with this?

In fact it can not read it as a txt file

# Obtain the Unorganized data from txt
file1=open('file.txt','r')
UnOrgan=file1.read()
rsc05
  • 3,626
  • 2
  • 36
  • 57
  • Possible duplicate of [While reading file on Python, I faced the error that said UnicodeDecodeError. What can I do to resolve this error?](https://stackoverflow.com/questions/16528468/while-reading-file-on-python-i-faced-the-error-that-said-unicodedecodeerror-wh) – ASGM Apr 16 '18 at 15:51
  • @ASGM the question is not about the error. The question is how to format this txt data to import them as dataframe. Thank you – rsc05 Apr 16 '18 at 15:58
  • @ASGM can you please remove the duplication because this is not a duplicated question. Thank you – rsc05 Apr 16 '18 at 16:11
  • Yes, I've removed it. – ASGM Apr 16 '18 at 16:37

1 Answers1

0

You should be able to just read it in using read_table.

import pandas as pd
df = pd.read_table(<your file>, sep="\t", headers=[rows with column info])

I'm guessing that the separator is a tab.

RCA
  • 508
  • 4
  • 12