3

Here's an example of the contents of my CSV file:

Fruit, colour, ripe,

apple, green,,
banana, yellow,,
pineapple, green,,
plum, purple,,

I want to loop through the contents of the CSV file and according to a test (extrinsic to the CSV data, using an input value supplied to the enclosing function), end up with something like this:

Fruit, colour, ripe,

apple, green, true, 
banana, yellow,, 
pineapple, green,, 
plum, purple, true,

My current code looks like this:

csv_data = csv.reader(open('./data/fruit_data.csv', 'r'))
for row in csv_data:
    fruit = row[0]
    if fruit == input:
    # Here, write 'true' in the 'ripe' column.

It's easy enough to add new data in one go, using the CSV module or pandas, but here I need to add the data iteratively. It seems that I can't change the CSV file in place(?), but if I write out to a different CSV file, it's going to overwrite on each match within the loop, so it'll only reflect that value.

CDspace
  • 2,639
  • 18
  • 30
  • 36
arokath
  • 233
  • 3
  • 9

4 Answers4

3

You have, basically, two approaches:

1- Open a second text file before your loop then loop through each row of the initial file and append rows to the second file. After all rows are done, close the initial file. Example: How do you append to a file?

2- Read in everything from the initial csv. Then make changes to the object you created (I highly recommend using Pandas for this). Then write out to a csv. Here's an example of that method:

import pandas as pd
import numpy as np

# read in the csv
csv_data = pd.read_csv('./data/fruit_data.csv')

# I'm partial to the numpy where logic when creating a new column based 
# on if/then logic on an existing column
csv_data['ripe'] = np.where(csv_data['fruit']==input, True, False)

# write out the csv
csv_data.to_csv('./data/outfile.csv')

The choice between 1 and 2 should really come down to scale. If your csv is so big that you can't read it all in and manipulate it the way you want, then you should molest it line by line. If you can read the whole thing in and then manipulate it with Pandas, your life will be MUCH easier.

Community
  • 1
  • 1
JD Long
  • 59,675
  • 58
  • 202
  • 294
  • 1
    These aren't super large files, so I might look into option two. If I understand correctly: read out the original file using [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv), manipulate the dataframe, then write out using [pandas.DataFrame.to_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html) ? – arokath Mar 29 '17 at 22:39
  • i added an example of the second option showing reading in and out using pandas. I wasn't sure your logic for populating `ripe` – JD Long Mar 30 '17 at 13:42
  • I really like this method, very elegant. I much appreciate the extra help. Thank you so much! – arokath Mar 30 '17 at 23:39
1

In order to make the changes, you have to add your new data to a location such as a list. This list will contain the results of your processing.

fruit_details= list()
csv_data = csv.reader(open('./data/fruit_data.csv', 'r'))
for row in csv_data:
    fruit = row[0]
    if fruit == input:
       fruit_details.append([row[0],row[1],'true'])

The resulting list fruit_details, will then contain the fruits with true in the "ripe" column. If you one to append the non-fruit items, add and else statement which either puts false or row[2] as necessary.

Prof3sa
  • 9
  • 3
1

If you create a temporary file, you can write your rows as you read them. If you use os.rename on Unix, "the renaming will be an atomic operation":

import csv
import os

def update_fruit_data(input):
    csv_file_name = 'data/fruit_data.csv'
    tmp_file_name = "%s.tmp" % csv_file_name
    # Update fruit data
    with open(csv_file_name, 'r') as csv_input_file:
        csv_reader = csv.reader(csv_input_file)
        with open(tmp_file_name, 'w') as csv_output_file:
            csv_writer = csv.writer(csv_output_file)
            for row in csv_reader:
                fruit = row[0]
                if fruit == input:
                    row[2] = 'true'
                csv_writer.writerow(row)
    # Rename tmp file to csv file
    os.rename(tmp_file_name, csv_file_name)

while True:
    input = get_input()
    update_fruit_data(input)

The get_input here is a stand-in for whatever you use to get the value of input.

  • Thank you, I ended up doing it a different way, but I'll certainly refer back to this in future. – arokath Mar 30 '17 at 23:34
1

If you want to create a new CSV file

csv_data = csv.reader(open('./Desktop/fruit_data.csv', 'r'))
csv_new = csv.writer(open('./Desktop/fruit_new_data.csv', 'w'))
for row in csv_data:
    fruit = row[0]
    if fruit == input:
        row.append("ripe")
        csv_new.writerow(row)
    else:
        csv_new.writerow(row)

Basically the only thing missing in your previous question is the last statment, which is to write, else is add in case the criteria does not match.

Another possibility could be to use linestartswith

Aly Abdelaziz
  • 292
  • 4
  • 24