Is there any python package to deal with any .txt files?

Question

I am using openpyxl to deal with many Excel operations, but similarly looking for any package to deal with .txt files.

I have a .txt files as below, which contains data in table-like form. I have to remove the special characters from there and get the original data to be put in an Excel sheet.

=============|=============|=============|=============|
    Name     |   Address   |PhoneNumber  |     Age     |
=============|=============|=============|=============|
   Matt      |   B3/24     |  8475968    |    25       |
   John      |   A1-A2 Wing|  3335968    |    44       |

Are there any?

Elaborate on what your problems are with dealing with ``.txt`` files – because you really should just open them with ``open``. — Jonas Schäfer, Jan 06 '13 at 20:13
My guess is that as Excel is mentioned - a likely answer is going to be the `csv` module... — Jon Clements, Jan 06 '13 at 20:17
Write a parser yourself for this home-grown format - not more tan five lines of cod. A good coding excercise — , Jan 06 '13 at 20:34
So - where is that coming from? Looks like a MySQL (or similar) table dump — Jon Clements, Jan 06 '13 at 20:36
@JonClements May be. i was told such data to load on Excel. such files and data volume to each of the files are huge. Thus I need to do it programaticaly. — Arup Rakshit, Jan 06 '13 at 20:38
Write your own code instead of waiting that such a parser would fall from heaven — , Jan 06 '13 at 20:40
Well, if it's coming from a datasource, get it in a proper format if possible. Otherwise, take the 4th row onwards and use the CSV module with a `|` delimiter if it's not going to appear in the table. Otherwise, take the length of the consecutive `=`s (and use that to build slicing) to apply to row 4 onwards. If you get stuck with that, then with some code, people will be able to help you with any specific problems... — Jon Clements, Jan 06 '13 at 20:40
You didn't ask, but if you're looking to access data from Excel files, you can use the xlrd library. See question http://stackoverflow.com/questions/1213690/what-is-the-most-compatible-way-to-install-python-modules-on-a-mac for instructions on doing that (or find the appropriate version for your OS). — Ben Mordecai, Jan 06 '13 at 21:07
@VBSlover If you have excel files that you want to read, the xlrd library can help you. I provided a link explaining the process of adding libraries in case you weren't familiar with it. — Ben Mordecai, Jan 06 '13 at 21:25

Yam Mesicka · Answer 1 · 2013-01-06T20:45:58.357

Why you need to "deal" with .txt files? Most of the time you just can open them using the open() command, and read/write to them using the read() and the write() methods.

Please look at the Reading and Writing Files section on the Python documentation. Start with the open() command, here.

If you have txt files written in some special format (such as CSV), you can try and search on PyPi for the right module. Please pay attention that in this case, Python have great built-in CSV module that can handle CSV files great. Read more here.

After you edited your post, I can tell you that you can go for asciitable, which is also on PyPi and have great documentation.
However - it's not look like hard word to do it without this library (Hi - just saying. You should definitely use this module, it looks awesome). Just read the file by lines, and split each line by the pipes. If the only content of the cells after the split is the = sign, leave a blank line. Else - just feel the cells with the content of the line after the split.

can you a bit do more for me,, a code with that. I am totally new to this package! — Arup Rakshit, Jan 06 '13 at 20:53
Me too! Just heard about it and gave a sneak peek at the documentation, never really tried it. Sure you'll do great :) — Yam Mesicka, Jan 06 '13 at 21:00

root · Answer 2 · 2013-01-06T20:56:09.327

3

I suggest you try asciitable.

edited Jan 06 '13 at 20:56

answered Jan 06 '13 at 20:41

root

76,608
25
108
120

Oooo... nice... Not one of those that's made me think "OMG! Why did I spend so long writing the code to do this myself?" modules, but looks good +1 – Jon Clements Jan 06 '13 at 20:44
@JonClements this is the cause i would always do post here,even if I am done. because lot's of technically advanced people are present in `SO` i believe personally who are having far knowledge rather than me also more than each `SO` people! – Arup Rakshit Jan 06 '13 at 20:52
@root can you give me a tiny code for the same,as i am just introduced with this package. Thanks in advance! – Arup Rakshit Jan 06 '13 at 21:00

elyase · Accepted Answer · 2013-01-06T22:15:16.927

2

I always use pandas for this kind of problems, this is how you go reading your file in one line:

from pandas import read_table
data = read_table('table.txt', sep='|', skiprows=[0,2]).dropna(axis=1)

Out:

     Name    Address    PhoneNumber  Age
0    Matt    B3/24       8475968     25
1    John    A1-A2 Wing  3335968     44

After that you access your columns with data['Name'] , manipulate, filter, etc. Finally you write to file with data.csv() or data.to_excel(). See the docs here(specially the groupby section):

EDIT: because the columns names are imported with white spaces(' Name ' instead of 'Name') you also need to do:

data.columns = data.columns.map(str.strip)

After that data['Name'], data['Address'] etc will work correctly. .dropna() is needed to remove the last empty column that gets wrongly imported because in your file there is a separator '|' at the end of each line. You can try it without .dropna() to see what happens.

edited Jan 06 '13 at 22:15

answered Jan 06 '13 at 21:06

elyase

39,479
12
112
119

Can I install `pandas` by `easy_install pandas`? – Arup Rakshit Jan 06 '13 at 21:08
I am using windows 7 and don't have pip installed. only easy Installer I am using! – Arup Rakshit Jan 06 '13 at 21:13
in the text file how would I run group by column, help me to understand! – Arup Rakshit Jan 06 '13 at 21:14
i guess it should work too(may be you need to install cython first) or you can try the windows installers: http://pypi.python.org/pypi/pandas – elyase Jan 06 '13 at 21:15
`+1` to give such package names out from you! :) – Arup Rakshit Jan 06 '13 at 21:15
It would be better if you read the documentation http://pandas.pydata.org/pandas-docs/dev/groupby.html or ask directly for a specific manipulation that you want to do with your data. – elyase Jan 06 '13 at 21:19
`dropna(axis=1)` what it does? – Arup Rakshit Jan 06 '13 at 21:23
@VBSlover-- pandas is actually a really good choice: you can write the dataframe that you get directly to excel http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html – root Jan 06 '13 at 22:12

Is there any python package to deal with any .txt files?

3 Answers3