converting a space delimited file to a CSV

Question

I have a text file containing tabular data. What I need to do is automate the task of writing to a new text file that is comma delimited instead of space delimited, extract a few columns from existing data, reorder the columns.

This is a snippet of the first 4 lines of the original data:

Number of rows: 8542
 Algorithm  |Date   |Time   |Longitude  |Latitude   |Country    
 1  2000-01-03  215926.688  -0.262  35.813  Algeria 
 1  2000-01-03  215926.828  -0.284  35.817  Algeria

Here is what I want in the end:

Longitude,Latitude,Country,Date,Time
-0.262,35.813,Algeria,2000-01-03,215926.688

Any tips on how to approach this?

There was something wrong with how you presented your snippet, I believe it's what you intended. — Jeff Mercado, Jan 14 '12 at 00:17
Yes the order is important. the latitude/longitude fields need to be the first two columns in the new output — MLau, Jan 14 '12 at 00:29
Seems like the OP can't join the chat until he has 20 reputation... How can I donate him some? — Paulo Scardine, Jan 14 '12 at 01:47

Paulo Scardine · Accepted Answer · 2012-01-14T01:24:32.077

7

I guess the file is separated by tabs, not spaces.

If so, you can try something like:

input_file = open('some_tab_separated_file.txt', 'r')
output_file = open('some_tab_separated_file.csv', 'w')
input_file.readline() # skip first line 
for line in input_file:
    (a, date, time, lon, lat, country) = line.strip().split('\t')
    output_file.write(','.join([lon, lat, country, date, time]) + '\n')
input_file.close()
output_file.close()

This code is untested, any bug is left for you as exercise.

edited Jan 14 '12 at 01:24

answered Jan 14 '12 at 00:42

Paulo Scardine

73,447
11
124
153

Hi, it looks like this would reorder the field header names. Will it also reorder the data in the columns or is there another step? – MLau Jan 14 '12 at 00:50
providing its tab-separated, should reorder header and data inside a `for line in file` loop. – Paulo Scardine Jan 14 '12 at 00:55
Still working on your method. Getting this: "failed to run script - syntax error - can't assign to literal" Changed all the field names to the correct name. – MLau Jan 14 '12 at 01:28
Could you join me at http://chat.stackoverflow.com/rooms/6712/paulos ?? I will help you. – Paulo Scardine Jan 14 '12 at 01:37
Paulo - I have to have a "reputation" of 20 to use the chat room. the reply function is disabled. Am at 13 now – MLau Jan 14 '12 at 01:42
ok, I'm having trouble with the 5th line. I'm replacing the field names that you have provided in the above code, but get a syntax error. Here is the line: (Algorithm, |Date, |Time, |Longitude, |Latitude, |Country, |Continent, |Region, |Admin, |GlobCover, |Biomass, |Koeppen) = line.split('\t') – MLau Jan 14 '12 at 01:55
That did it! Your code did the trick. Thank You!! -so what is the "|" character for in this example? – MLau Jan 14 '12 at 01:58
I think "|" in python is the bitwise "or" operator, so you cant use it in variable names. – Paulo Scardine Jan 14 '12 at 02:08
That makes sense. Thanks for sharing your knowledge, this has been a great experience and resource. – MLau Jan 14 '12 at 02:13
@MLau: I hope you increase your reputation soon in order to be able to use the chat. Good luck! – Paulo Scardine Jan 14 '12 at 02:18

Bruno · Answer 2 · 2012-01-14T02:00:40.213

You could use the csv module and a reader with the ' ' delimiter to read your data in, and use the a writer from the same module (with a comma delimiter) to produce the output.

In fact, the first example in the csv module documentation uses delimiter=' '.

You can use a DictReader/DictWriter and specify the order of the columns in its constructor (fieldnames list: different for reader/writer if you want to re-order) to output the entries in the order you wish.

(You may need to skip/ignore your first two rows when producing the output.)

EDIT:

Here is an example for dealing with multi-word country names:

import cStringIO
import csv

f = cStringIO.StringIO("""A B C
1 2 Costa Rica
3 4 Democratic Republic of the Congo
""")

r = csv.DictReader(f, delimiter=' ', restkey='rest')
for row in r:
    if row.get('rest'):
        row['C'] += " %s" % (" ".join(row['rest']))
    print 'A: %s, B: %s, C: %s' % (row['A'], row['B'], row['C'])

Use the restkey= and concatenate the dict entry for that value, which is a list of what's left over (here restkey='rest'). This prints:

A: 1, B: 2, C: Costa Rica
A: 3, B: 4, C: Democratic Republic of the Congo

What happens with a country like "Costa Rica" if you use `delimiter=' '`? — Paulo Scardine, Jan 14 '12 at 01:40
@PauloScardine, you should be able to use the `restkey` for this (see update). Of course, in the initial example, this will require a bit of clean up with the initial two row, but even when using a plain (non `DictReader`), it should be possible to obtain what's left and join it together. — Bruno, Jan 14 '12 at 02:01

score 0 · Answer 3 · answered Feb 06 '18 at 12:10

I guess the important idea is that you have to use '\t' as the delimiter @Paulo Scardine.

I just wanted to add that pandas is a very good library for handling column data.

>>> src = 'path/to/file'
>>> dest = 'path/to/dest_csv'
>>> column_names = ['names', 'of', 'columns']

>>> df = pd.read_csv(src, delimiter='\t', names=column_names)

# Do something in pandas if you need to

>>> df.to_csv(dest, index=False, sep = ';')

score 0 · Answer 4 · answered Jan 14 '12 at 00:17

0

str.split() without any arguments will split by any length of whitespace. operator.itemgetter() takes multiple arguments, and will return a tuple.

answered Jan 14 '12 at 00:17

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

converting a space delimited file to a CSV

4 Answers4

Linked