96

How do I read the following CSV file?

1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3

How do I write the following data to a CSV file?

data = [
    (1, "A towel,", 1.0),
    (42, " it says, ", 2.0),
    (1337, "is about the most ", -1),
    (0, "massively useful thing ", 123),
    (-2, "an interstellar hitchhiker can have.", 3),
]
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 6
    This is intended to be a [canonical question](http://meta.stackoverflow.com/a/291994/562769) as I just found a lot of duplicates which frame the question differently, but are essentially this question. – Martin Thoma Jan 11 '17 at 07:32
  • Examples for dupes: http://stackoverflow.com/questions/5788521/reading-a-csv-file-using-python http://stackoverflow.com/questions/26903304/reading-data-from-a-csv-file-in-python http://stackoverflow.com/questions/1593318/how-to-read-a-csv-file-with-python http://stackoverflow.com/questions/24662571/python-import-csv-to-list http://stackoverflow.com/questions/34568774/reading-a-csv-file-using-python-3 http://stackoverflow.com/questions/14725020/read-csv-file-from-python http://stackoverflow.com/questions/16283799/how-to-read-a-csv-file-from-a-url-python ... – Martin Thoma Jan 11 '17 at 07:35
  • See also: [Creating a dictionary from a CSV file](https://stackoverflow.com/questions/14091387/creating-a-dictionary-from-a-csv-file) – mkrieger1 Jun 14 '22 at 22:31

7 Answers7

108

Here are some minimal complete examples how to read CSV files and how to write CSV files with Python.

Pure Python:

import csv

# Define data
data = [
    (1, "A towel,", 1.0),
    (42, " it says, ", 2.0),
    (1337, "is about the most ", -1),
    (0, "massively useful thing ", 123),
    (-2, "an interstellar hitchhiker can have.", 3),
]

# Write CSV file
with open("test.csv", "wt") as fp:
    writer = csv.writer(fp, delimiter=",")
    # writer.writerow(["your", "header", "foo"])  # write header
    writer.writerows(data)

# Read CSV file
with open("test.csv") as fp:
    reader = csv.reader(fp, delimiter=",", quotechar='"')
    # next(reader, None)  # skip the headers
    data_read = [row for row in reader]

print(data_read)

After that, the contents of data_read are

[['1', 'A towel,', '1.0'],
 ['42', ' it says, ', '2.0'],
 ['1337', 'is about the most ', '-1'],
 ['0', 'massively useful thing ', '123'],
 ['-2', 'an interstellar hitchhiker can have.', '3']]

Please note that CSV reads only strings. You need to convert to the column types manually.

A Python 2+3 version was here before (link), but Python 2 support is dropped. Removing the Python 2 stuff massively simplified this answer.

Related

mpu

Have a look at my utility package mpu for a super simple and easy to remember one:

import mpu.io
data = mpu.io.read('example.csv', delimiter=',', quotechar='"', skiprows=None)
mpu.io.write('example.csv', data)

Pandas

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('myfile.csv', sep=',')
print(df)

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

See read_csv docs for more information. Please note that pandas automatically infers if there is a header line, but you can set it manually, too.

If you haven't heard of Seaborn, I recommend having a look at it.

Other

Reading CSV files is supported by a bunch of other libraries, for example:

Created CSV file

1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3

Common file endings

.csv

Working with the data

After reading the CSV file to a list of tuples / dicts or a Pandas dataframe, it is simply working with this kind of data. Nothing CSV specific.

Alternatives

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 2
    @icedwater This is a possibility. However, I prefer Pandas: (1) It automatically deals with headers (2) it loads the file directly from the path and does not expect a file pointer (3) it has better "export" options (like the dict export - yes, you can do that with CSV, too. But Pandas is simpler). But feel free to post a solution with does not need Pandas :-) – Martin Thoma Jan 11 '17 at 08:02
  • Thanks, I was wondering because you used `csv` for writing. I would prefer `csv` or `pandas` for both, and `csv` over `pandas` because it is more likely to already be there. – icedwater Jan 11 '17 at 08:22
  • 2
    @icedwater Ok, I've added a pure `csv` solution (which is now also consistent in structure with my other answers to the other file formats like YAML and JSON) – Martin Thoma Feb 10 '17 at 16:17
  • For Python 3, you should *always* open CSV files with `newline=""` as per the [documentation](https://docs.python.org/3/library/csv.html#csv.reader) — so I think you should change this answer to Your "canonical" question accordingly. – martineau Jun 18 '22 at 17:14
  • For small files the time for importing the `pandas` package can be larger than running the complete pure Python solution. In my case the threshold was at about 50 MB. – Stefan Schmidt May 15 '23 at 13:49
2

Writing a CSV file

First you need to import csv

For example:

import csv

with open('eggs.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                        quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
2

If needed- read a csv file without using the csv module:

rows = []
with open('test.csv') as f:
    for line in f:
        # strip whitespace
        line = line.strip()
        # separate the columns
        line = line.split(',')
        # save the line for use later
        rows.append(line)
wwii
  • 23,232
  • 7
  • 37
  • 77
  • Nice, quick and dirty solution, for CSV's that do not have quoted values. Without too much further trouble, you could read the first line first (the one with the column names), split it on commas, and then append `dict(zip(headers, line))` to get a nice collection of labeled data. – PaulMcG Feb 22 '22 at 05:24
2

If you are working with CSV data and want a solution with a smaller footprint than pandas, you can try my package, littletable. Can be pip-installed, or just dropped in as a single .py file with your own code, so very portable and suitable for serverless apps.

Reading CSV data is as simple as calling csv_import:

data = """\
1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3"""

import littletable as lt
tbl = lt.Table().csv_import(data, fieldnames="number1,words,number2".split(','))
tbl.present()

Prints:

  Number1   Words                                  Number2  
 ────────────────────────────────────────────────────────── 
  1         A towel,                               1.0      
  42         it says,                              2.0      
  1337      is about the most                      -1       
  0         massively useful thing                 123      
  -2        an interstellar hitchhiker can have.   3    

(littletable uses the rich module for presenting Tables.)

littletable doesn't automatically try to convert numeric data, so a numeric transform function is needed for the numeric columns.

def get_numeric(s):
    try:
        return int(s)
    except ValueError:
        try:
            return float(s)
        except ValueError:
            return s

tbl = lt.Table().csv_import(
    data,
    fieldnames="number1,words,number2".split(','),
    transforms={}.fromkeys("number1 number2".split(), get_numeric)
)
tbl.present()

This gives:

  Number1   Words                                  Number2  
 ────────────────────────────────────────────────────────── 
        1   A towel,                                   1.0  
       42    it says,                                  2.0  
     1337   is about the most                           -1  
        0   massively useful thing                     123  
       -2   an interstellar hitchhiker can have.         3  

The numeric columns are right-justified instead of left-justified.

littletable also has other ORM-ish features, such as indexing, joining, pivoting, and full-text search. Here is a table of statistics on the numeric columns:

tbl.stats("number1 number2".split()).present()

  Name       Mean   Min    Max   Variance              Std_Dev   Count   Missing  
 ──────────────────────────────────────────────────────────────────────────────── 
  number1   275.6    -2   1337   352390.3    593.6247130974249       5         0  
  number2    25.6    -1    123     2966.8   54.468339427597755       5         0  

or transposed:

tbl.stats("number1 number2".split(), by_field=False).present()

  Stat                 Number1              Number2 
 ─────────────────────────────────────────────────── 
  mean                   275.6                 25.6
  min                       -2                   -1
  max                     1337                  123
  variance            352390.3               2966.8
  std_dev    593.6247130974249   54.468339427597755
  count                      5                    5
  missing                    0                    0

Other formats can be output too, such as Markdown:

print(tbl.stats("number1 number2".split(), by_field=False).as_markdown())

| stat | number1 | number2 |
|---|---:|---:|
| mean | 275.6 | 25.6 |
| min | -2 | -1 |
| max | 1337 | 123 |
| variance | 352390.3 | 2966.8 |
| std_dev | 593.6247130974249 | 54.468339427597755 |
| count | 5 | 5 |
| missing | 0 | 0 |

Which would render from Markdown as

stat number1 number2
mean 275.6 25.6
min -2 -1
max 1337 123
variance 352390.3 2966.8
std_dev 593.6247130974249 54.468339427597755
count 5 5
missing 0 0

Lastly, here is a text search on the words for any entry with the word "hitchhiker":

tbl.create_search_index("words")
for match in tbl.search.words("hitchhiker"):
    print(match)

Prints:

   namespace(number1=-2, words='an interstellar hitchhiker can have.', number2=3)
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
0
import csv
with open(fileLocation+'example.csv',newline='') as File: #the csv file is stored in a File object

    reader=csv.reader(File)       #csv.reader is used to read a file
    for row in reader:
        print(row)
0

To read a csv file using Pandas

use pd.read_csv("D:\\sample.csv")

using only python :

fopen=open("D:\\sample.csv","r") 

print(fopen.read())

To create and write into a csv file

The below example demonstrate creating and writing a csv file. to make a dynamic file writer we need to import a package import csv, then need to create an instance of the file with file reference Ex:

with open("D:\sample.csv","w",newline="") as file_writer

Here if the file does not exist with the mentioned file directory then python will create a same file in the specified directory, and w represents write, if you want to read a file then replace w with r or to append to existing file then a.

newline="" specifies that it removes an extra empty row for every time you create row so to eliminate empty row we use newline="", create some field names(column names) using list like:

fields=["Names","Age","Class"]

Then apply to writer instance like:

writer=csv.DictWriter(file_writer,fieldnames=fields)

Here using Dictionary writer and assigning column names, to write column names to csv we use writer.writeheader() and to write values we use writer.writerow({"Names":"John","Age":20,"Class":"12A"}) ,while writing file values must be passed using dictionary method , here the key is column name and value is your respective key value.

Import csv:

with open("D:\sample.csv","w",newline="") as file_writer:

fields=["Names","Age","Class"]

writer=csv.DictWriter(file_writer,fieldnames=fields)

writer.writeheader()

writer.writerow({"Names":"John","Age":21,"Class":"12A"})
Calos
  • 1,783
  • 19
  • 28
0

I wrote a question asking something similar. So, to keep everything in one place, here are my 2 cents for a very quick & dirty solution.

This code is meant to read from one CSV file and write to another. The format of the output row is fixed, I could have used csv.write with the correct delimiter, but in this case I would have to do extra work to specify the blanks. But it works nicely to show how to output text using the good old print() function:

#! /usr/bin/env python3

def main():
    parser = argparse.ArgumentParser(
        description='',
        usage="""myparser [-h] print this help and exit
        """,
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )
    parser.add_argument('-f', '--file',help='CSV input file', required=True)
  
    args = parser.parse_args()

    with open("output.file", "w") as outfile:

        with open(args.file) as csvfile:
       
            csv_reader = csv.reader(csvfile, delimiter=',')
            line_count = 0
            for row in csv_reader:
                if line_count == 0:
                    line_count += 1
               elif args.archive:
                    print(f'no:{row[0]}:{row[1]}::0:0:0:/bin/bash:0:0:{row[2]}:{row[3]}:{row[4]}:archive', file=outfile)
                    line_count += 1 

return sys.exit(EXIT_SUCCESS)

if __name__ == '__main__':
    main()
   

Sorry for the indentation.

This code opens a CSV file for reading and uses the print() function to write a formatted string that will look like:

no:Xoero:ToelAs:xtoelas:0:0:0:/bin/bash:0:0:y.r.j.pols@tue.nl:00311234567890:nl:archive
runlevel0
  • 2,715
  • 2
  • 24
  • 31