85

I have several CSV files that look like this:

Input
Name        Code
blackberry  1
wineberry   2
rasberry    1
blueberry   1
mulberry    2

I would like to add a new column to all CSV files so that it would look like this:

Output
Name        Code    Berry
blackberry  1   blackberry
wineberry   2   wineberry
rasberry    1   rasberry
blueberry   1   blueberry
mulberry    2   mulberry

The script I have so far is this:

import csv
with open(input.csv,'r') as csvinput:
    with open(output.csv, 'w') as csvoutput:
        writer = csv.writer(csvoutput)
        for row in csv.reader(csvinput):
            writer.writerow(row+['Berry'])

(Python 3.2)

But in the output, the script skips every line and the new column has only Berry in it:

Output
Name        Code    Berry
blackberry  1   Berry

wineberry   2   Berry

rasberry    1   Berry

blueberry   1   Berry

mulberry    2   Berry
martineau
  • 119,623
  • 25
  • 170
  • 301
fairyberry
  • 1,125
  • 1
  • 11
  • 13
  • possible duplicate of [Copy one column to another but with different header](http://stackoverflow.com/questions/11063707/copy-one-column-to-another-but-with-different-header) – Martijn Pieters Jun 17 '12 at 10:12
  • is it possible you only have 'Berry' in your last column because you are only writing 'Berry' to the file? (row+['Berry']) What did you expect to write? – Dhara Jun 17 '12 at 10:16
  • @Dhara: I would like to have Berry as a header and Name column value as row value for the Berry. See above. – fairyberry Jun 17 '12 at 10:31
  • you also use pandas data frame as suggested in this [page](https://stackoverflow.com/questions/33139513/python-pandas-insert-column) – Hemanth Kumar Oct 12 '18 at 04:58

11 Answers11

107

This should give you an idea of what to do:

>>> v = open('C:/test/test.csv')
>>> r = csv.reader(v)
>>> row0 = r.next()
>>> row0.append('berry')
>>> print row0
['Name', 'Code', 'berry']
>>> for item in r:
...     item.append(item[0])
...     print item
...     
['blackberry', '1', 'blackberry']
['wineberry', '2', 'wineberry']
['rasberry', '1', 'rasberry']
['blueberry', '1', 'blueberry']
['mulberry', '2', 'mulberry']
>>> 

Edit, note in py3k you must use next(r)

Thanks for accepting the answer. Here you have a bonus (your working script):

import csv

with open('C:/test/test.csv','r') as csvinput:
    with open('C:/test/output.csv', 'w') as csvoutput:
        writer = csv.writer(csvoutput, lineterminator='\n')
        reader = csv.reader(csvinput)

        all = []
        row = next(reader)
        row.append('Berry')
        all.append(row)

        for row in reader:
            row.append(row[0])
            all.append(row)

        writer.writerows(all)

Please note

  1. the lineterminator parameter in csv.writer. By default it is set to '\r\n' and this is why you have double spacing.
  2. the use of a list to append all the lines and to write them in one shot with writerows. If your file is very, very big this probably is not a good idea (RAM) but for normal files I think it is faster because there is less I/O.
  3. As indicated in the comments to this post, note that instead of nesting the two with statements, you can do it in the same line:

    with open('C:/test/test.csv','r') as csvinput, open('C:/test/output.csv', 'w') as csvoutput:

joaquin
  • 82,968
  • 29
  • 138
  • 152
  • thanks for the note. I tried and it gives me attribute error: '_csv.reader' object has no attribute 'next'. Do you have any idea? – fairyberry Jun 17 '12 at 10:44
  • I see you are in py3k. then you must use next(r) instead of r.next() – joaquin Jun 17 '12 at 10:52
  • @joaquin: OMG. Thanks for the bonus!! – fairyberry Jun 25 '12 at 13:44
  • 7
    Note: instead of nesting `with` statements, you can do it at the same line separating them with a comma e.g.: `with open(input_filename) as input_file, open(output_filename, 'w') as output_file` – Caumons Jun 30 '16 at 15:51
  • @Caumons You are right and this would be nowadays the way to go. Note my answer tried to keep the OP code structure to focus on the solution to his problem. – joaquin Jul 01 '16 at 05:37
  • This answer puts all of the input file into memory in order to write it all at once using writerows, which is fine for decently sized files, but can explode for larger ones. – pedrostrusso Nov 21 '19 at 12:24
81

I'm surprised no one suggested Pandas. Although using a set of dependencies like Pandas might seem more heavy-handed than is necessary for such an easy task, it produces a very short script and Pandas is a great library for doing all sorts of CSV (and really all data types) data manipulation. Can't argue with 4 lines of code:

import pandas as pd
csv_input = pd.read_csv('input.csv')
csv_input['Berries'] = csv_input['Name']
csv_input.to_csv('output.csv', index=False)

Check out Pandas Website for more information!

Contents of output.csv:

Name,Code,Berries
blackberry,1,blackberry
wineberry,2,wineberry
rasberry,1,rasberry
blueberry,1,blueberry
mulberry,2,mulberry
Jough Dempsey
  • 663
  • 8
  • 22
Blairg23
  • 11,334
  • 6
  • 72
  • 72
  • Thanks @Jough Dempsey! – Blairg23 Feb 28 '17 at 08:27
  • 1
    How to update or add new column in same csv?? input.csv?? – Ankit Maheshwari Nov 04 '19 at 06:31
  • 1
    @AnkitMaheshwari, change the name of `output.csv` in this example to `input.csv`. It will do the same thing, but output to `input.csv`. – Blairg23 Nov 05 '19 at 20:46
  • @Blairg23 Thanks, but it replaces the content with the updated one. – Ankit Maheshwari Nov 06 '19 at 05:35
  • 1
    @AnkitMaheshwari Yes... that is the intended functionality. You want to replace the old content (the content with `Name` and `Code`) with the new content which has the same two columns from the old content PLUS a new column with `Berries`, as the OP asked. – Blairg23 Nov 11 '19 at 22:52
  • 1
    A word of caution: Pandas is great for decently sized files. This answer will load all the data into memory, which can be troublesome for large files. – pedrostrusso Nov 19 '19 at 21:34
  • 2
    @pedrostrusso But unless you're loading 4-16 gb files, you should be good on RAM. Unless you use a potato. – Blairg23 Nov 21 '19 at 22:54
  • @Blairg23 Pandas can't deal well with larger than memory files, as in the case of genomic data, where files can easily go upwards of 100GB. Which was what originally brought me to this question. – pedrostrusso Nov 22 '19 at 12:46
  • @pedrostrusso That makes sense. Neither can any other library. In fact, you'd be hard-pressed to find any program that can deal with files that are larger than your memory space... how would they load them into memory to write to them? – Blairg23 Nov 23 '19 at 20:45
  • you wouldn't; it would be much better to read line per line (and thus placing only each individual line into memory at a time) and processing each line at a time, as @jgritty's answer does below. – pedrostrusso Nov 25 '19 at 12:14
  • @pedrostrusso I don't believe that is the case. When you do the line `with open('input.csv','r') as csvinput:`, you are opening the entire file for reading. The `open()` file command opens the file. Are you sure it doesn't put the entire file into memory at that point? The documentation seems to say that it opens the entire file: https://docs.python.org/3.8/library/functions.html#open – Blairg23 Nov 25 '19 at 21:41
  • The real answer to this problem is using a file stream buffer to stream in a line at a time (and write to a new file at the same time), if you need a large file to be read in place. – Blairg23 Nov 25 '19 at 21:43
  • I'm not sure where you're saying the docs say that. The documentation you linked to say that the `open()` command returns a file object, which is "An object exposing a file-oriented API (with methods such as read() or write()) to an underlying resource." That can then used by the csv.reader function (an iterator), to process each individual line, and then, as you said, write each line into a new file. – pedrostrusso Nov 26 '19 at 12:37
18
import csv
with open('input.csv','r') as csvinput:
    with open('output.csv', 'w') as csvoutput:
        writer = csv.writer(csvoutput)

        for row in csv.reader(csvinput):
            if row[0] == "Name":
                writer.writerow(row+["Berry"])
            else:
                writer.writerow(row+[row[0]])

Maybe something like that is what you intended?

Also, csv stands for comma separated values. So, you kind of need commas to separate your values like this I think:

Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2
jgritty
  • 11,660
  • 3
  • 38
  • 60
8

Yes Its a old question but it might help some

import csv
import uuid

# read and write csv files
with open('in_file','r') as r_csvfile:
    with open('out_file','w',newline='') as w_csvfile:

        dict_reader = csv.DictReader(r_csvfile,delimiter='|')
        #add new column with existing
        fieldnames = dict_reader.fieldnames + ['ADDITIONAL_COLUMN']
        writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter='|')
        writer_csv.writeheader()


        for row in dict_reader:
            row['ADDITIONAL_COLUMN'] = str(uuid.uuid4().int >> 64) [0:6]
            writer_csv.writerow(row)
Tpk43
  • 363
  • 1
  • 5
  • 23
7

I used pandas and it worked well... While I was using it, I had to open a file and add some random columns to it and then save back to same file only.

This code adds multiple column entries, you may edit as much you need.

import pandas as pd

csv_input = pd.read_csv('testcase.csv')         #reading my csv file
csv_input['Phone1'] = csv_input['Name']         #this would also copy the cell value 
csv_input['Phone2'] = csv_input['Name']
csv_input['Phone3'] = csv_input['Name']
csv_input['Phone4'] = csv_input['Name']
csv_input['Phone5'] = csv_input['Name']
csv_input['Country'] = csv_input['Name']
csv_input['Website'] = csv_input['Name']
csv_input.to_csv('testcase.csv', index=False)   #this writes back to your file

If you want that cell value doesn't gets copy, so first of all create a empty Column in your csv file manually, like you named it as Hours then, Now for this you can add this line in above code,

csv_input['New Value'] = csv_input['Hours']

or simply we can, without adding the manual column, we can

csv_input['New Value'] = ''    #simple and easy

I Hope it helps.

enigma
  • 1,029
  • 10
  • 11
3

For adding a new column to an existing CSV file(with headers), if the column to be added has small enough number of values, here is a convenient function (somewhat similar to @joaquin's solution). The function takes the

  1. Existing CSV filename
  2. Output CSV filename (which will have the updated content) and
  3. List with header name&column values
def add_col_to_csv(csvfile,fileout,new_list):
    with open(csvfile, 'r') as read_f, \
        open(fileout, 'w', newline='') as write_f:
        csv_reader = csv.reader(read_f)
        csv_writer = csv.writer(write_f)
        i = 0
        for row in csv_reader:
            row.append(new_list[i])
            csv_writer.writerow(row)
            i += 1 

Example:

new_list1 = ['test_hdr',4,4,5,5,9,9,9]
add_col_to_csv('exists.csv','new-output.csv',new_list1)

Existing CSV file: enter image description here

Output(updated) CSV file: enter image description here

dna-data
  • 73
  • 5
2

I don't see where you're adding the new column, but try this:

    import csv
    i = 0
    Berry = open("newcolumn.csv","r").readlines()
    with open(input.csv,'r') as csvinput:
        with open(output.csv, 'w') as csvoutput:
            writer = csv.writer(csvoutput)
            for row in csv.reader(csvinput):
                writer.writerow(row+","+Berry[i])
                i++
manicphase
  • 618
  • 6
  • 9
2

This code will suffice your request and I have tested on the sample code.

import csv

with open(in_path, 'r') as f_in, open(out_path, 'w') as f_out:
    csv_reader = csv.reader(f_in, delimiter=';')
    writer = csv.writer(f_out)

    for row in csv_reader:
    writer.writerow(row + [row[0]]
Ashwaq
  • 431
  • 7
  • 17
2

In case of a large file you can use pandas.read_csv with the chunksize argument which allows to read the dataset per chunk:

import pandas as pd

INPUT_CSV = "input.csv"
OUTPUT_CSV = "output.csv"
CHUNKSIZE = 1_000 # Maximum number of rows in memory

header = True
mode = "w"
for chunk_df in pd.read_csv(INPUT_CSV, chunksize=CHUNKSIZE):
    chunk_df["Berry"] = chunk_df["Name"]
    # You apply any other transformation to the chunk
    # ...
    chunk_df.to_csv(OUTPUT_CSV, header=header, mode=mode)
    header = False # Do not save the header for the other chunks
    mode = "a" # 'a' stands for append mode, all the other chunks will be appended

If you want to update the file inplace, you can use a temporary file and erase it at the end

import pandas as pd

INPUT_CSV = "input.csv"
TMP_CSV = "tmp.csv"
CHUNKSIZE = 1_000 # Maximum number of rows in memory

header = True
mode = "w"
for chunk_df in pd.read_csv(INPUT_CSV, chunksize=CHUNKSIZE):
    chunk_df["Berry"] = chunk_df["Name"]
    # You apply any other transformation to the chunk
    # ...
    chunk_df.to_csv(TMP_CSV, header=header, mode=mode)
    header = False # Do not save the header for the other chunks
    mode = "a" # 'a' stands for append mode, all the other chunks will be appended

os.replace(TMP_CSV, INPUT_CSV)
2

You may just write:

import pandas as pd
import csv
df = pd.read_csv('csv_name.csv')
df['Berry'] = df['Name']
df.to_csv("csv_name.csv",index=False)

Then you are done. To check it, you may run:

h = pd.read_csv('csv_name.csv') 
print(h)

If you want to add a column with some arbitrary new elements(a,b,c), you may replace the 4th line of the code by:

df['Berry'] = ['a','b','c']
1

Append new column in existing csv file using python without header name

  default_text = 'Some Text'
# Open the input_file in read mode and output_file in write mode
    with open('problem-one-answer.csv', 'r') as read_obj, \
    open('output_1.csv', 'w', newline='') as write_obj:
# Create a csv.reader object from the input file object
    csv_reader = reader(read_obj)
# Create a csv.writer object from the output file object
    csv_writer = csv.writer(write_obj)
# Read each row of the input csv file as list
    for row in csv_reader:
# Append the default text in the row / list
        row.append(default_text)
# Add the updated row / list to the output file
        csv_writer.writerow(row)

Thankyou