How can I prevent users from entering the same input twice when writing directly to .csv in python

Question

I'm Trying to create a data-set from user input. I am trying to prevent duplicates for one of the fields. I ask the user to pick a letter that matches their name, their age, their gpa and major. I want to make sure the letters entered are unique, but I am not sure how to do this when writing directly to a .csv file.

here is what I have so far.

import csv
from colorama import Fore, Back, Style


with open('students2.csv', 'w+', newline='') as csvfile:
        columnheaders = ['NAME','AGE','GPA','MAJOR']
        writer = csv.DictWriter(csvfile, fieldnames=columnheaders)

        writer.writeheader()

        for i in range(0,10):
            askname=input('Please select the letter that matches your name from the following:  (A, B, C, D, E, F, G, H, I, J),  ')
            askage=input('Please enter your Age: ')
            askgpa=input('Please enter your GPA: ')
            askmajor=input('Please select your major from the following (CS, CET, CIS, CE) ')
            writer.writerow({'NAME': askname,'AGE': askage,'GPA': askgpa,'MAJOR': askmajor}) 
            print(Back.BLACK +'My name starts with the letter:', askname ,' and I am ', askage, 'years old. I study ', askmajor, 'and my GPA is: ', askgpa)
            print(Style.RESET_ALL)

I know how to do this with a list,

namelist = []

while True:

    #Input name
    while True:
        name = str(input('What is your name? '))
        if name.upper() not in ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'):
            print("Please use (A, B, C, D, E, F, G, H, I, J).")
            continue
        if name in namelist:
            print("This name has already been used.")
            continue
        else:
            namelist.append(name)
            break

But is it possible to do this without having to do it via a list and then convert it to .csv?

Any help will be appreciated it. Thanks in advance.

Why "without having to do it via a list and then convert it to .csv"? That seems like a pretty reasonable solution. — ChrisGPT was on strike, Nov 16 '19 at 03:27
@Chris wondering if it can be done in one process vs. two separate ones. — JPnova87, Nov 16 '19 at 03:31
@JPnova87 What do you mean by _process_? That sounds like the kind of word we could spend an eternity arguing about. — AMC, Nov 16 '19 at 03:32
@AlexanderCécile my process I mean: ask user for input and store it as a list, and then convert the list to csv file VS asking for user input and storing it directly to csv file. While making sure the user doesn't pick a letter that has already been picked(A-J) — JPnova87, Nov 16 '19 at 03:36
You're either going to have to keep track of inputs in memory while writing your file (two "processes") or read your file back and look at recorded values (two "processes", plus a ton of inefficiency). The input→list→CSV solution is at least as simple and much more elegant. (Note that "process" means something very different from what you're saying in computing.) — ChrisGPT was on strike, Nov 16 '19 at 03:42
@JPnova87 Since the program is made to accept all 10 inputs in a row, it might be useful to display the list of available letters, right? — AMC, Nov 16 '19 at 03:47
Now that I’ve taken another look at it, that second code example is bizarre. Doesn’t it loop forever? — AMC, Nov 16 '19 at 03:57
@AlexanderCécile good suggestion, maybe I can print the list of available letters for the user. — JPnova87, Nov 16 '19 at 04:05
@JPnova87 Just because i’m curious, why are these user “names” restricted like this? Is it as part of an assignment or similar? — AMC, Nov 16 '19 at 04:07
@AlexanderCécile is part of an assignment, no idea why the requester doesn't want no duplicate "names" — JPnova87, Nov 16 '19 at 04:09
would you be able to show one line of the csv file for format, and are you interested in an answer with pandas that includes your questions to do what your looking for? I upvoted your question, i think it's a great question, csv is difficult to deal with directly. — oppressionslayer, Nov 16 '19 at 04:23
@oppressionslayer yes definitely, I am open to use pandas if needed. — JPnova87, Nov 16 '19 at 04:30
@oppressionslayer After the code runs the CSV file looks like this: ``` NAME AGE GPA MAJOR A 20 3.2 CIS B 31 4 CS C 34 3.5 CE D 18 2 CS ``` — JPnova87, Nov 16 '19 at 04:35
I added a pandas solution, let me know if you think it's worth pursuing and I can modify it with any suggestions. — oppressionslayer, Nov 16 '19 at 05:05

score 4 · Answer 1 · answered Nov 16 '19 at 03:47

You will need to keep a copy of the list in memory (you could scan the CSV each time, but that would result in a lot of needless disk IO).

My suggestion would be to cache the names in a set, so you'd have something like nameseen = set() at the top of your script and then you'd check it before writing the row. Something like:

if not (askname in nameseen):
    writer.writerow({'NAME': askname,'AGE': askage,'GPA': askgpa,'MAJOR': askmajor}) 
    nameseen.add(askname)

    print(Back.BLACK +'My name starts with the letter:', askname ,' and I am ', askage, 'years old. I study ', askmajor, 'and my GPA is: ', askgpa)
    print(Style.RESET_ALL)
else:
    print("This name has already been used.")

oppressionslayer · Accepted Answer · 2019-11-17T19:32:14.790

If you can use pandas, you can do it like this:

import pandas as pd
df = pd.read_csv('kd.csv', index_col=0)
df.to_csv()
# 'NAME,AGE,GPA,MAJOR\nBill,18,4.0,CS\nMike,20,2.9,BS\nWill,20,2.4,CS\nBill,18,4.0,CS\n'

df.drop_duplicates(subset=None, inplace=True)
df.to_csv()
# 'NAME,AGE,GPA,MAJOR\nBill,18,4.0,CS\nMike,20,2.9,BS\nWill,20,2.4,CS\n'

UPDATE

I changed it to update with your comments. Some updates, creates the file if it doesn't exist and still working on improving it with your comments. You can hit CTRL-D if you get stuck in an infinite loop.

$ cat kd2.csv
NAME AGE GPA MAJOR
A 20 3.2 CIS
B 31 4.0 CS
C 34 3.5 CE
D 18 2.0 CS
E 4.0 3.2 CE

import io

def new_student_add():
  only_allowed = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
  stub = io.StringIO('NAME AGE GPA MAJOR\n')
  while True:
     try:
        df = pd.read_csv('kd4.csv', delim_whitespace=True, index_col=0) 
     except:
        stub.seek(0)
        df = pd.read_csv(stub, delim_whitespace=True, index_col=0)
     new_csv_dict = {}
     try:
        new_name =input('Please select the letter that matches your name from the following:  (A, B, C, D, E, F, G, H, I, J):  ')
     except:
        break
     if new_name not in only_allowed:
        print("Only letters in {} are allowed".format(only_allowed))
        continue
     if  new_name in df.index: 
        print("This name has already been used.")
        continue
     new_csv_dict['AGE'] =input('Please enter your Age: ')
     new_csv_dict['GPA'] =input('Please enter your GPA: ')
     new_csv_dict['MAJOR'] =input('Please select your major from the following (CS, CET, CIS, CE) ')
     df.loc[new_name] = new_csv_dict   
     break 
  df.to_csv(r'kd4.csv', sep=' ')
  return df 

for x in range (0,9):
   df = new_student_add()


for name, row in  df.iterrows(): 
  print("My name starts with the letter {} and I am {} years old. I study {} and my GPA is: {}".format(name, int(row['AGE']), row['MAJOR'], row['GPA']))

# This may be much faster, so adding it in in case the author needs a faster algorithm. Thanks AlexanderCécile
# for item in df.itertuples(): 
# print(f"My name starts with the letter {item[0]} and I am {item[1]} years old. I study {item[3]} and my GPA is: {item[2]}")

I’m not sure this is same functionality as OP wanted though. It seems like he wanted the user to keep retrying until they found a name which was available, no? — AMC, Nov 16 '19 at 03:59
Thanks for the help! I tried this but @AlexanderCécile is correct. I need to have the user keep trying until they pick an available "name/ letter" — JPnova87, Nov 16 '19 at 04:16
That looks really good! seeing it in a function would be nice! So in the end the code should: Ask the user to enter a letter that matches their name from: (A, B, C, D, E, F, G, H, I, J), AGE, GPA (1-4) and MAJOR from (CS, CET, CIS, CE). Then print these information as follows: My name starts with the letter (NAME) and I am (AGE) years old. I study (MAJOR) and my GPA is: (GPA). Repeat step 1 for 10 users with different values (Names can’t be repeated). Store the entered information into a csv file. — JPnova87, Nov 16 '19 at 05:24
Don't use `iterrows`, it has some major disadvantages! `itertuples` provides essential the same functionality, without any of the flaws. See [this post](https://stackoverflow.com/q/24870953/11301900), for example. Also, why aren't you using f-strings? — AMC, Nov 17 '19 at 02:47
It's an easy change, i can update it if JPnova87 would like to see it's use that way, it's this code: #for item in df.itertuples(): # print(f"My name starts with the letter {item[0]} and I am {item[1]} years old. I study {item[3]} and my GPA is: {item[2]}") — oppressionslayer, Nov 17 '19 at 03:03
@AlexanderCécile Usually i shy away from f' format unless i'm 100% sure python3 is in use, sometime i get that doesn't work from users, but i do use them quite often. I can't wait until 3.8 takes off, the walrus operator is awesome! — oppressionslayer, Nov 17 '19 at 03:05
@oppressionslayer Same, really looking forward to 3.8 :D Really though, `iterrows()` messes with the data types, and in my own tests is **1000 times slower** than `itertuples()` on simple tasks. — AMC, Nov 17 '19 at 19:27

How can I prevent users from entering the same input twice when writing directly to .csv in python

2 Answers2