-1

I am using openpyxl to deal with many Excel operations, but similarly looking for any package to deal with .txt files.

I have a .txt files as below, which contains data in table-like form. I have to remove the special characters from there and get the original data to be put in an Excel sheet.

=============|=============|=============|=============|
    Name     |   Address   |PhoneNumber  |     Age     |
=============|=============|=============|=============|
   Matt      |   B3/24     |  8475968    |    25       |
   John      |   A1-A2 Wing|  3335968    |    44       |

Are there any?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Arup Rakshit
  • 116,827
  • 30
  • 260
  • 317
  • 8
    Elaborate on what your problems are with dealing with ``.txt`` files – because you really should just open them with ``open``. – Jonas Schäfer Jan 06 '13 at 20:13
  • .txt files should contain exactly what? –  Jan 06 '13 at 20:16
  • My guess is that as Excel is mentioned - a likely answer is going to be the `csv` module... – Jon Clements Jan 06 '13 at 20:17
  • Write a parser yourself for this home-grown format - not more tan five lines of cod. A good coding excercise –  Jan 06 '13 at 20:34
  • So - where is that coming from? Looks like a MySQL (or similar) table dump – Jon Clements Jan 06 '13 at 20:36
  • @JonClements May be. i was told such data to load on Excel. such files and data volume to each of the files are huge. Thus I need to do it programaticaly. – Arup Rakshit Jan 06 '13 at 20:38
  • Write your own code instead of waiting that such a parser would fall from heaven –  Jan 06 '13 at 20:40
  • 1
    Well, if it's coming from a datasource, get it in a proper format if possible. Otherwise, take the 4th row onwards and use the CSV module with a `|` delimiter if it's not going to appear in the table. Otherwise, take the length of the consecutive `=`s (and use that to build slicing) to apply to row 4 onwards. If you get stuck with that, then with some code, people will be able to help you with any specific problems... – Jon Clements Jan 06 '13 at 20:40
  • @JonClements +1 for the logic :) – Arup Rakshit Jan 06 '13 at 21:03
  • You didn't ask, but if you're looking to access data from Excel files, you can use the xlrd library. See question http://stackoverflow.com/questions/1213690/what-is-the-most-compatible-way-to-install-python-modules-on-a-mac for instructions on doing that (or find the appropriate version for your OS). – Ben Mordecai Jan 06 '13 at 21:07
  • @BenMordecai I didn't get your point! – Arup Rakshit Jan 06 '13 at 21:10
  • 1
    @VBSlover If you have excel files that you want to read, the xlrd library can help you. I provided a link explaining the process of adding libraries in case you weren't familiar with it. – Ben Mordecai Jan 06 '13 at 21:25

3 Answers3

5

Why you need to "deal" with .txt files? Most of the time you just can open them using the open() command, and read/write to them using the read() and the write() methods.

Please look at the Reading and Writing Files section on the Python documentation. Start with the open() command, here.


If you have txt files written in some special format (such as CSV), you can try and search on PyPi for the right module. Please pay attention that in this case, Python have great built-in CSV module that can handle CSV files great. Read more here.


After you edited your post, I can tell you that you can go for asciitable, which is also on PyPi and have great documentation.
However - it's not look like hard word to do it without this library (Hi - just saying. You should definitely use this module, it looks awesome). Just read the file by lines, and split each line by the pipes. If the only content of the cells after the split is the = sign, leave a blank line. Else - just feel the cells with the content of the line after the split.

Yam Mesicka
  • 6,243
  • 7
  • 45
  • 64
3

I suggest you try asciitable.

root
  • 76,608
  • 25
  • 108
  • 120
  • Oooo... nice... Not one of those that's made me think "OMG! Why did I spend so long writing the code to do this myself?" modules, but looks good +1 – Jon Clements Jan 06 '13 at 20:44
  • @JonClements this is the cause i would always do post here,even if I am done. because lot's of technically advanced people are present in `SO` i believe personally who are having far knowledge rather than me also more than each `SO` people! – Arup Rakshit Jan 06 '13 at 20:52
  • @root can you give me a tiny code for the same,as i am just introduced with this package. Thanks in advance! – Arup Rakshit Jan 06 '13 at 21:00
2

I always use pandas for this kind of problems, this is how you go reading your file in one line:

from pandas import read_table
data = read_table('table.txt', sep='|', skiprows=[0,2]).dropna(axis=1)

Out:

     Name    Address    PhoneNumber  Age
0    Matt    B3/24       8475968     25
1    John    A1-A2 Wing  3335968     44

After that you access your columns with data['Name'] , manipulate, filter, etc. Finally you write to file with data.csv() or data.to_excel(). See the docs here(specially the groupby section):

EDIT: because the columns names are imported with white spaces(' Name ' instead of 'Name') you also need to do:

data.columns = data.columns.map(str.strip)

After that data['Name'], data['Address'] etc will work correctly. .dropna() is needed to remove the last empty column that gets wrongly imported because in your file there is a separator '|' at the end of each line. You can try it without .dropna() to see what happens.

elyase
  • 39,479
  • 12
  • 112
  • 119