Dump a NumPy array into a csv file

Question

How do I dump a 2D NumPy array into a csv file in a human-readable format?

score 1221 · Accepted Answer · edited Aug 26 '17 at 05:39

1221

numpy.savetxt saves an array to a text file.

import numpy
a = numpy.asarray([ [1,2,3], [4,5,6], [7,8,9] ])
numpy.savetxt("foo.csv", a, delimiter=",")

edited Aug 26 '17 at 05:39

cs95

379,657
97
704
746

answered May 21 '11 at 10:10

Jim Brissom

31,821
4
39
33

2

is this preferred over looping through the array by dimension? I'm guessing so. – Ehtesh Choudhury May 21 '11 at 10:13
The array is an ndarray. I hope it adds up. – Dexter May 21 '11 at 16:53
65

you can also change the format of each figure with the fmt keyword. default is '%.18e', this can be hard to read, you can use '%.3e' so only 3 decimals are shown. – Andrea Zonca May 22 '11 at 17:25
5

Andrea, Yes I used %10.5f. It was pretty convenient. – Dexter May 23 '11 at 09:47
16

Your method works well for numerical data, but it throws an error for `numpy.array` of strings. Could you prescribe a method to save as csv for an `numpy.array` object containing strings? – Ébe Isaac Mar 25 '16 at 14:31
2

What does the scipy documentation mean when it says delimiter is the character or string separating columns? When I use savetxt() it throws everything in the same column. Also, how do we go about saving in .tsv format? Do we use 4 spaces? The scipy documentation doesn't touch on .tsv at all, but .tsv is such a common format, there must be a way. Any thoughts? – Arash Howaida Sep 29 '16 at 17:58
26

@ÉbeIsaac You can specify the format as string as well: `fmt='%s'` – Luis Apr 06 '17 at 16:34
You can even set different formats for each column, eg. `fmt = '%.4f, %.8f'` to write 4 and 8 decimals in the first and second column, respectively. – Adrian Aug 29 '17 at 15:58
TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e') – Sohaib Aslam May 15 '18 at 20:13
Should this answer have `comments=''` to get rid of the weird hash symbol at the start of the column names? – Dave C Feb 06 '19 at 14:19
@EhteshChoudhury Usually when there is a function you can call instead of creating a loop that accomplishes the same thing, the function call is preferred since it makes the code simpler. (If calling the function wouldn't be the preferred method, why would the function in that case exist?) – HelloGoodbye Oct 26 '19 at 15:33
This only works when it's a numerical array. If it's an array of object (string), you need third argument `fmt='%s'` to avoid failing with `TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e')`. Can you update your answer? – smci Feb 15 '20 at 07:42
relevant: https://stackoverflow.com/a/22582992/14919621 – Gray Programmerz Jul 20 '22 at 17:20

score 230 · Answer 2 · edited Jul 13 '23 at 07:38

230

Use the pandas library's DataFrame.to_csv. It does take some extra memory, but it's very fast and easy to use.

import pandas as pd 
df = pd.DataFrame(np_array)
df.to_csv("path/to/file.csv")

If you don't want a header or index, use:

df.to_csv("path/to/file.csv", header=False, index=False)

edited Jul 13 '23 at 07:38

Mateen Ulhaq

24,552
19
101
135

answered Dec 12 '16 at 08:38

maxbellec

16,093
10
36
43

1

I find it again and again that the best csv exports are when 'piped' into pandas' to_csv – mork Apr 02 '17 at 08:03
9

Not good. This creates a df and consumes extra memory for nothing – Tex May 31 '17 at 23:05
29

worked like charm, it's very fast - tradeoff for extra memory usage. parameters `header=None, index=None` remove header row and index column. – thepunitsingh Nov 24 '17 at 06:39
Works for lists with strings, too. – circuitdesigner5172 Feb 15 '18 at 17:24
1

The `numpy.savetxt` method is great, but it puts a hash symbol at the start of the header line. – Dave C Dec 12 '18 at 16:35
3

@DaveC : You have to set the `comments` keyword argument to `''`, the `#` will be suppressed. – Milind R Jan 14 '19 at 20:31
1

`index=False` not `index=None`, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html – rosefun Aug 25 '20 at 23:58
@maxbellec It gives, ValueError: Must pass 2-d input – Abhilash Singh Chauhan Mar 06 '21 at 10:21
@AbhilashSinghChauhan well yes, csv data is 2 dimensionnal (row and columns) – maxbellec Mar 11 '21 at 22:06
@maxbellec I know that CSV data is 2D data, but while reading raster as numpy dataset, it additionally add the layer count and dataset become 3-D, even if the raster dataset has single layer, it still shows data as (count, height, width) where count = 1, how to export that data to .CSV or .txt file? – Abhilash Singh Chauhan Mar 12 '21 at 08:26
1

See [this answer](https://stackoverflow.com/a/56101795) for why this may be helpful: You can set the decimal separator in pandas on export. – schade96 Jul 18 '21 at 12:01
In case it's not a numpy array but a list with lists (a Matrix ) `m1 = [ [1,2,3], [4,5,6], [7,8,9] ]`, this answer works like a charm as well! – M.K Nov 04 '22 at 15:57

score 60 · Answer 3 · edited Jan 07 '18 at 17:57

60

tofile is a convenient function to do this:

import numpy as np
a = np.asarray([ [1,2,3], [4,5,6], [7,8,9] ])
a.tofile('foo.csv',sep=',',format='%10.5f')

The man page has some useful notes:

This is a convenience function for quick storage of array data. Information on endianness and precision is lost, so this method is not a good choice for files intended to archive data or transport data between machines with different endianness. Some of these problems can be overcome by outputting the data as text files, at the expense of speed and file size.

Note. This function does not produce multi-line csv files, it saves everything to one line.

edited Jan 07 '18 at 17:57

YakovL

7,557
12
62
102

answered May 12 '15 at 11:37

Lee

29,398
28
117
170

7

As far as I can tell, this does not produce a csv file, but puts everything on a single line. – Peter Jan 14 '16 at 18:46
@Peter, good point, thanks, I've updated the answer. For me it does save ok in csv format (albeit limited to one line). Also, it's clear that the asker's intent is to "dump it in human-readable format" - so I think the answer is relevant and useful. – Lee Jan 15 '16 at 10:35
2

Actually, np.savetext() provides the newline argument, not np.tofile() – eaydin Aug 26 '18 at 00:48

score 24 · Answer 4 · answered Oct 26 '18 at 03:40

As already discussed, the best way to dump the array into a CSV file is by using .savetxt(...)method. However, there are certain things we should know to do it properly.

For example, if you have a numpy array with dtype = np.int32 as

   narr = np.array([[1,2],
                 [3,4],
                 [5,6]], dtype=np.int32)

and want to save using savetxt as

np.savetxt('values.csv', narr, delimiter=",")

It will store the data in floating point exponential format as

1.000000000000000000e+00,2.000000000000000000e+00
3.000000000000000000e+00,4.000000000000000000e+00
5.000000000000000000e+00,6.000000000000000000e+00

You will have to change the formatting by using a parameter called fmt as

np.savetxt('values.csv', narr, fmt="%d", delimiter=",")

to store data in its original format

Saving Data in Compressed gz format

Also, savetxt can be used for storing data in .gz compressed format which might be useful while transferring data over network.

We just need to change the extension of the file as .gz and numpy will take care of everything automatically

np.savetxt('values.gz', narr, fmt="%d", delimiter=",")

Hope it helps

The `fmt="%d"` was what I was looking for. Thank you! – payne Dec 23 '18 at 01:47 — payne, Dec 23 '18 at 01:47

Mike T · Answer 5 · 2021-03-25T22:03:49.453

Writing record arrays as CSV files with headers requires a bit more work.

This example reads from a CSV file (example.csv) and writes its contents to another CSV file (out.csv).

import numpy as np

# Write an example CSV file with headers on first line
with open('example.csv', 'w') as fp:
    fp.write('''\
col1,col2,col3
1,100.1,string1
2,222.2,second string
''')

# Read it as a Numpy record array
ar = np.recfromcsv('example.csv', encoding='ascii')
print(repr(ar))
# rec.array([(1, 100.1, 'string1'), (2, 222.2, 'second string')], 
#           dtype=[('col1', '<i8'), ('col2', '<f8'), ('col3', '<U13')])

# Write as a CSV file with headers on first line
with open('out.csv', 'w') as fp:
    fp.write(','.join(ar.dtype.names) + '\n')
    np.savetxt(fp, ar, '%s', ',')

Note that the above example cannot handle values which are strings with commas. To always enclose non-numeric values within quotes, use the csv built-in module:

import csv

with open('out2.csv', 'w', newline='') as fp:
    writer = csv.writer(fp, quoting=csv.QUOTE_NONNUMERIC)
    writer.writerow(ar.dtype.names)
    writer.writerows(ar.tolist())

This is where pandas again helps. You can do: pd.DataFrame(out, columns=['col1', 'col2']), etc — EFreak, May 11 '20 at 21:51

score 9 · Answer 6 · edited May 12 '22 at 20:00

9

To store a NumPy array to a text file, import savetxt from the NumPy module

consider your Numpy array name is train_df:

import numpy as np
np.savetxt('train_df.txt', train_df, fmt='%s')

OR

from numpy import savetxt
savetxt('train_df.txt', train_df, fmt='%s')

edited May 12 '22 at 20:00

Ege Kaan Gürkan

2,923
2
13
24

answered Jul 25 '21 at 14:49

Hemang Dhanani

175
1
4

Since you are calling `np.savetext(...`, you don't need the import call `from numpy import savetxt`. If you do import it, you can simply call it as `savetext(...` – Atybzz Jan 20 '22 at 19:29

score 8 · Answer 7 · edited Feb 11 '19 at 21:27

I believe you can also accomplish this quite simply as follows:

Convert Numpy array into a Pandas dataframe
Save as CSV

e.g. #1:

    # Libraries to import
    import pandas as pd
    import nump as np

    #N x N numpy array (dimensions dont matter)
    corr_mat    #your numpy array
    my_df = pd.DataFrame(corr_mat)  #converting it to a pandas dataframe

e.g. #2:

    #save as csv 
    my_df.to_csv('foo.csv', index=False)   # "foo" is the name you want to give
                                           # to csv file. Make sure to add ".csv"
                                           # after whatever name like in the code

No need for a remake, [the original](https://stackoverflow.com/a/41096943/774575) is crisp and clear. — mins, Jan 19 '21 at 20:12

score 5 · Answer 8 · answered Mar 07 '17 at 10:49

if you want to write in column:

    for x in np.nditer(a.T, order='C'): 
            file.write(str(x))
            file.write("\n")

Here 'a' is the name of numpy array and 'file' is the variable to write in a file.

If you want to write in row:

    writer= csv.writer(file, delimiter=',')
    for x in np.nditer(a.T, order='C'): 
            row.append(str(x))
    writer.writerow(row)

score 4 · Answer 9 · answered Nov 08 '18 at 11:48

In Python we use csv.writer() module to write data into csv files. This module is similar to the csv.reader() module.

import csv

person = [['SN', 'Person', 'DOB'],
['1', 'John', '18/1/1997'],
['2', 'Marie','19/2/1998'],
['3', 'Simon','20/3/1999'],
['4', 'Erik', '21/4/2000'],
['5', 'Ana', '22/5/2001']]

csv.register_dialect('myDialect',
delimiter = '|',
quoting=csv.QUOTE_NONE,
skipinitialspace=True)

with open('dob.csv', 'w') as f:
    writer = csv.writer(f, dialect='myDialect')
    for row in person:
       writer.writerow(row)

f.close()

A delimiter is a string used to separate fields. The default value is comma(,).

This has already been suggested: https://stackoverflow.com/a/41009026/8881141 Please only add new approaches, don't repeat previously published suggestions. — Mr. T, Nov 08 '18 at 12:16

Mr Poin · Answer 10 · 2016-10-17T17:23:37.400

2

If you want to save your numpy array (e.g. your_array = np.array([[1,2],[3,4]])) to one cell, you could convert it first with your_array.tolist().

Then save it the normal way to one cell, with delimiter=';' and the cell in the csv-file will look like this [[1, 2], [2, 4]]

Then you could restore your array like this: your_array = np.array(ast.literal_eval(cell_string))

edited Oct 17 '16 at 17:23

answered Oct 17 '16 at 16:50

Mr Poin

61
4

1

well that is literally going to destroy all the memory savings for using a numpy array – PirateApp Apr 16 '18 at 08:00

score 2 · Answer 11 · edited Sep 29 '17 at 01:39

2

You can also do it with pure python without using any modules.

# format as a block of csv text to do whatever you want
csv_rows = ["{},{}".format(i, j) for i, j in array]
csv_text = "\n".join(csv_rows)

# write it to a file
with open('file.csv', 'w') as f:
    f.write(csv_text)

edited Sep 29 '17 at 01:39

Hemen Ashodia

499
3
16

answered Sep 07 '17 at 07:05

Greg

5,422
1
27
32

1

This uses **a lot of memory**. Prefer looping over each row and format&write it. – remram Oct 02 '17 at 13:01
@remram it depends on your data, but yes if it is big it can use a lot of memory – Greg Oct 02 '17 at 23:49

score 2 · Answer 12 · answered May 21 '22 at 11:48

numpy.savetxt() method is used to save a NumPy array into an output text file, however by default it will make use of scientific notation.

If you'd like to avoid this, then you need to specify an appropriate format using fmt argument. For example,

import numpy as np

np.savetxt('output.csv', arr, delimiter=',', fmt='%f')

score 0 · Answer 13 · answered Apr 05 '23 at 04:47

As other answers mentioned, it's important to pass the fmt= in order to save a "human-readable" file. In fact, if you pass a separate format for each column, you don't need to pass a delimiter.

arr = np.arange(9).reshape(3, 3)
np.savetxt('out.csv', arr, fmt='%f,%.2f,%.1f')

It saves a file whose contents look like:

0.000000,1.00,2.0
3.000000,4.00,5.0
6.000000,7.00,8.0

Now to read the file from csv, use np.loadtxt():

np.loadtxt('out.csv', delimiter=',')

If you want to append to an existing file (as well as create a new file), use a context manager and open a file with mode='ab'.

with open('out.csv', 'ab') as f:
    np.savetxt(f, arr, delimiter=',', fmt='%.1f')

Dump a NumPy array into a csv file

13 Answers13

Saving Data in Compressed gz format

Linked

Related