Text file to a excel file (tab delimited) with python

Question

I have a txt file that looks like this

1000  lewis     hamilton  36
1001 sebastian vettel 34
1002  lando  norris  21

i want them to look like this

I tried the solution in here but it gave me a blank excel file and error when trying to open it

There is more than one million lines and each lines contains around 10 column

And one last thing i am not 100% sure if they are tab elimited because some columns looks like they have more space in between them than the others but when i press to backspace once they stick to each other so i guess it is

You can open this in pandas by using whitespace as the delimiter, then save to a CSV (comma-separated values) or TSV (tab-separated values). Excel will happily open either of those formats. — jkr, Oct 14 '21 at 13:43

Youssef_boughanmi · Answer 1 · 2021-10-14T13:56:42.073

1

you can use pandas read_csv for read your txt file and then save it like an excel file with .to_excel

df = pd.read_csv('your_file.txt' , delim_whitespace=True)
df.to_excel('your_file.xlsx' , index = False)

here some documentation :

pandas.read_csv : https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

.to_excel : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

edited Oct 14 '21 at 13:56

answered Oct 14 '21 at 13:50

Youssef_boughanmi

379
1
11

the OP says they don't know if it is actually tab-delimited. it's probably better to use `delim_whitespace=True`. also if you are using `to_excel`, why would you use a `.csv` extension? – jkr Oct 14 '21 at 13:52
for the .csv extension it's a mistake ^^ and for the delim you're right – Youssef_boughanmi Oct 14 '21 at 13:55

Bill · Answer 2 · 2021-10-14T14:38:24.260

1

If you're not sure about how the fields are separated, you can use '\s' to split by spaces.

import pandas as pd 
df = pd.read_csv('f1.txt', sep="\s+", header=None)
# you might need: pip install openpyxl
df.to_excel('f1.xlsx', 'Sheet1')

Example of randomly separated fields (f1.txt):

1000  lewis     hamilton  2 36
1001 sebastian vettel 8 34
1002  lando  norris   6 21

If you have some lines having more columns than the first one, causing:

ParserError: Error tokenizing data. C error: Expected 5 fields in line 5, saw 6

You can ignore those by using:

df = pd.read_csv('f1.txt', sep="\s+", header=None,  error_bad_lines=False)

This is an example of data:

1000  lewis     hamilton  2 36
1001 sebastian vettel 8 34
1002  lando  norris     6 21
1003 charles leclerc           1 3
1004 carlos sainz  ferrari 2 2

The last line will be ignored:

b'Skipping line 5: expected 5 fields, saw 6\n'

edited Oct 14 '21 at 14:38

answered Oct 14 '21 at 13:53

Bill

315
3
18

This one worked with my small example so thank you for that. I dont know if you are familiar with it but i have an error saying pandas.errors.ParserError: Error tokenizing data. C error: Expected 19 fields in line 4, saw 20 i think this happened because even if there is 18 or 19 columns some columns have 2 or 3 words inside of them anyway to fix this ? – Ahmet Bilgin Oct 14 '21 at 14:17
You can try: pd.read_csv('f1.txt', sep="\s+", header=None, error_bad_lines=False) to skip those lines. – Bill Oct 14 '21 at 14:27
so if i use this those lines will be gone ? that is not exactly what i want – Ahmet Bilgin Oct 14 '21 at 14:59
Then you should check those lines (use the warnings 'skipping line LINE') and look for some patterns that can help to split the columns. Can you put some examples of these lines here ? – Bill Oct 14 '21 at 15:23
Of course i was going to but lines were too long to fit here so here are the pictures: This is what the first 4 lines looks like https://imgur.com/a/raL9GsG this is what happens when i run the code for these lines https://imgur.com/a/fAi3BeF (the program deletes the 4th line because after the date it has one extra text) and this is what i want https://imgur.com/a/fg4uFTg now for the 4th line we can delete that extra text or we can put it together with the next one both of them is okay by me – Ahmet Bilgin Oct 14 '21 at 16:06
Are all the lines causing the problem like the one you mentioned ? and can you paste your 4 lines here: https://regex101.com/ and share the link here (via share regex at top left), it would be a lot easier. – Bill Oct 15 '21 at 14:05

Text file to a excel file (tab delimited) with python

2 Answers2