Convert a space delimited file to comma separated values file in python

Question

I am very new to Python. I know that this has already been asked, and I apologise, but the difference in this new situation is that spaces between strings are not equal. I have a file, named coord, that contains the following space delimited strings:

   1  C       6.00    0.000000000    1.342650315    0.000000000
   2  C       6.00    0.000000000   -1.342650315    0.000000000
   3  C       6.00    2.325538562    2.685300630    0.000000000
   4  C       6.00    2.325538562   -2.685300630    0.000000000
   5  C       6.00    4.651077125    1.342650315    0.000000000
   6  C       6.00    4.651077125   -1.342650315    0.000000000
   7  C       6.00   -2.325538562    2.685300630    0.000000000
   8  C       6.00   -2.325538562   -2.685300630    0.000000000
   9  C       6.00   -4.651077125    1.342650315    0.000000000
  10  C       6.00   -4.651077125   -1.342650315    0.000000000
  11  H       1.00    2.325538562    4.733763602    0.000000000
  12  H       1.00    2.325538562   -4.733763602    0.000000000
  13  H       1.00   -2.325538562    4.733763602    0.000000000
  14  H       1.00   -2.325538562   -4.733763602    0.000000000
  15  H       1.00    6.425098097    2.366881801    0.000000000
  16  H       1.00    6.425098097   -2.366881801    0.000000000
  17  H       1.00   -6.425098097    2.366881801    0.000000000
  18  H       1.00   -6.425098097   -2.366881801    0.000000000

Please, note the spaces before the start of each string in the first column. So I have tried the following in order of converting it to csv:

with open('coord') as infile, open('coordv', 'w') as outfile:
    outfile.write(infile.read().replace("  ", ", "))

# Unneeded columns are deleted from the csv

input = open('coordv', 'rb')
output = open('coordcsvout', 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
    if row:
        writer.writerow(row)
input.close()
output.close()

with open("coordcsvout","rb") as source:
    rdr= csv.reader( source )
    with open("coordbarray","wb") as result:
        wtr= csv.writer(result)
        for r in rdr:
            wtr.writerow( (r[5], r[6], r[7]) )

When I run the script, I get the following for the coordv in the very first part of the script, which is of course very wrong:

,  1, C, , ,  6.00, , 0.000000000, , 1.342650315, , 0.000000000
,  2, C, , ,  6.00, , 0.000000000,  -1.342650315, , 0.000000000
,  3, C, , ,  6.00, , 2.325538562, , 2.685300630, , 0.000000000
,  4, C, , ,  6.00, , 2.325538562,  -2.685300630, , 0.000000000
,  5, C, , ,  6.00, , 4.651077125, , 1.342650315, , 0.000000000
,  6, C, , ,  6.00, , 4.651077125,  -1.342650315, , 0.000000000
,  7, C, , ,  6.00,  -2.325538562, , 2.685300630, , 0.000000000
,  8, C, , ,  6.00,  -2.325538562,  -2.685300630, , 0.000000000
,  9, C, , ,  6.00,  -4.651077125, , 1.342650315, , 0.000000000
, 10, C, , ,  6.00,  -4.651077125,  -1.342650315, , 0.000000000
, 11, H, , ,  1.00, , 2.325538562, , 4.733763602, , 0.000000000
, 12, H, , ,  1.00, , 2.325538562,  -4.733763602, , 0.000000000
, 13, H, , ,  1.00,  -2.325538562, , 4.733763602, , 0.000000000
, 14, H, , ,  1.00,  -2.325538562,  -4.733763602, , 0.000000000
, 15, H, , ,  1.00, , 6.425098097, , 2.366881801, , 0.000000000
, 16, H, , ,  1.00, , 6.425098097,  -2.366881801, , 0.000000000
, 17, H, , ,  1.00,  -6.425098097, , 2.366881801, , 0.000000000
, 18, H, , ,  1.00,  -6.425098097,  -2.366881801, , 0.000000000

I have tried different possibilities in .replace without any success, and so far I haven't found any source of information on how I could do this. What would be the best way to get a comma-separated values from this coord file? What I'm interested is in using then the csv module in python to choose columns 4:6 and finally use numpy to import them as follows:

from numpy import genfromtxt
cocmatrix = genfromtxt('input', delimiter=',')

I'd be very glad if somebody could help me with this problem.

If the sole purpose is just to convert from one type to another, bash script would be easy, right? — Ananta, Nov 03 '13 at 23:48
I know how to use sed, awk, bash scripting, etc. However, my purpose is not only to convert from one type file to another. I'm processing the output file from a quantum chemistry program to do some operations in order to automatize later lots of calculations based on considering the center of charges of localized molecular orbitals. — muammar, Nov 04 '13 at 00:24
It looks like a fixed width file (fields in set position). Here is question on fixed widths: http://stackoverflow.com/questions/4914008/efficient-way-of-parsing-fixed-width-files-in-python, alternatively you could use slicing to split it up http://stackoverflow.com/questions/509211/pythons-slice-notation — Bruce Martin, Nov 04 '13 at 03:10

the wolf · Answer 1 · 2013-11-03T23:40:05.440

16

You can use csv:

import csv

with open(ur_infile) as fin, open(ur_outfile, 'w') as fout:
    o=csv.writer(fout)
    for line in fin:
        o.writerow(line.split())

edited Nov 03 '13 at 23:40

answered Nov 03 '13 at 23:35

the wolf

34,510
13
53
71

1

Note that the `.strip()` is superfluous here; `line.split()` already does that. – DSM Nov 03 '13 at 23:39
@thewolf getting an extra empty row for each row printed (otherwise works great)...any idea why this could be happening? – Chris Feb 25 '17 at 17:56

Daniel · Answer 2 · 2013-11-03T23:46:47.730

You can use python pandas, I have written your data to data.csv:

import pandas as pd
>>> df = pd.read_csv('data.csv',sep='\s+',header=None)
>>> df
     0  1  2         3         4  5
0    1  C  6  0.000000  1.342650  0
1    2  C  6  0.000000 -1.342650  0
2    3  C  6  2.325539  2.685301  0
3    4  C  6  2.325539 -2.685301  0
4    5  C  6  4.651077  1.342650  0
5    6  C  6  4.651077 -1.342650  0
...

The great thing about this is to access the underlying numpy array you can use df.values:

>>> type(df.values)
<type 'numpy.ndarray'>

To save the data frame with comma delimiters:

>>> df.to_csv('data_out.csv',header=None)

Pandas is a great library for managing large amounts of data, as a bonus it works well with numpy. There is also a very good chance that this will be much faster then using the csv module.

j011y · Accepted Answer · 2013-11-04T05:12:27.707

7

replace your first bit with this. it's not super pretty but it will give you a csv format.

with open('coord') as infile, open('coordv', 'w') as outfile:
    for line in infile:
        outfile.write(" ".join(line.split()).replace(' ', ','))
        outfile.write(",") # trailing comma shouldn't matter

if you want the outfile to have everything on different lines you could add outfile.write("\n") at the end of the for loop, but i dont think your code that follows this will work with it like that.

edited Nov 04 '13 at 05:12

answered Nov 03 '13 at 23:30

j011y

111
3

2

Have you actually tested that code? The input file has sequences of multiple spaces which translates to sequences of empty fields i.e. `['', '', '', '1', '', 'C', '', '', '', '', '', '', '6.00', '', '', '', '0.000000000', '', '', '', '1.342650315', '', '', '', '0.000000000']` for the first row. -1 because it doesn't work. – Cristian Ciupitu Nov 03 '13 at 23:49
i know it does, which is why i suggested the newline character. – j011y Nov 05 '13 at 04:43
Sorry, I missed that, although I thought that the intent of the author was clear. I've removed the -1. – Cristian Ciupitu Nov 05 '13 at 19:52
I wanted to add that this answer is much more general. I am now instead using this as answer for this question. – muammar Jan 05 '14 at 16:04
I have same problem, but the values has spaces too. – ira Apr 13 '22 at 08:41

score 1 · Answer 4 · answered Nov 04 '13 at 00:01

>>> a = 'cah  1  C       6.00    0.000000000    1.342650315    0.000000000'
=>  a = 'cah  1  C       6.00    0.000000000    1.342650315    0.000000000'

>>> a.split()
=>  ['cah', '1', 'C', '6.00', '0.000000000', '1.342650315', '0.000000000']

>>> ','.join(a.split())
=>  'cah,1,C,6.00,0.000000000,1.342650315,0.000000000'

>>> ['"' + x + '"' for x in a.split()]
=>  ['"cah"', '"1"', '"C"', '"6.00"', '"0.000000000"', '"1.342650315"', '"0.000000000"']

>>> ','.join(['"' + x + '"' for x in a.split()]
=>  '"cah","1","C","6.00","0.000000000","1.342650315","0.000000000"'

score 1 · Answer 5 · edited Oct 14 '19 at 20:01

1

for converting "space" to ","

only fill the filename to what you want

with open('filename') as infile, open('output', 'w') as outfile:
    outfile.write(infile.read().replace(" ", ","))

for converting "," to "Space"

with open('filename') as infile, open('output', 'w') as outfile: outfile.write(infile.read().replace(",", " "))

edited Oct 14 '19 at 20:01

Aleksander Lidtke

2,876
4
29
41

answered Oct 14 '19 at 18:27

Majid Hoseiny

11
1

score 0 · Answer 6 · answered Nov 03 '13 at 23:51

0

Why not to read a file line by line? Split a line into a list then rejoin a list with ','.

answered Nov 03 '13 at 23:51

user1667218

49
8

1

Show us some code. Besides this has been already [suggested](http://stackoverflow.com/a/19759560/12892) by [the wolf](http://stackoverflow.com/users/455276/the-wolf). – Cristian Ciupitu Nov 03 '13 at 23:55

score 0 · Answer 7 · answered Nov 04 '13 at 00:41

The csv module is good, or here's a way to do it without:

#!/usr/local/cpython-3.3/bin/python

with open('input-file.csv', 'r') as infile, open('output.csv', 'w') as outfile:
    for line in infile:
        fields = line.split()
        outfile.write('{}\n'.format(','.join(fields)))

score 0 · Answer 8 · edited Jun 20 '20 at 09:12

0

For Merging Multiple text files in one CSV

import csv
import os
for x in range(0,n):            #n = max number of files 
    with open('input{}.txt'.format(x)) as fin, open('output.csv', 'a') as fout:
       csv_output=csv.writer(fout)
       for line in fin:
            csv_output.writerow(line.split())

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 22 '19 at 10:18

Ranjeet R Patil

453
6
10

Convert a space delimited file to comma separated values file in python

8 Answers8

for converting "space" to ","

for converting "," to "Space"

For Merging Multiple text files in one CSV

Linked