So i'm new to python, I want to discover it's potential and get more knowledge about what I can do with it. I did this code to compare CSVs, basically what it does you provide it 2 CSVs, CSV1 that has some id column and a column with values that you want to add to other CSV (CSV2)
Note: This script does exactly what I want, and it seems to work fine, hope it can also be useful to someone, my question is really what I can do to improve it's performance or even make the code cleaner
# Made by Varqas
# CSV1 = CSV containing values that can be matched in CSV2 and a column that will be added
# CSV2 = CSV containing values that can be matched and column that will be concatenated at the end of the CSV (The last column values should be empty)
with open('csv1.csv', encoding="utf8") as check_file:
# Get Column that will be used to Compare values and add it to a list
columnToCompare = list([row.split(',')[0].strip() for row in check_file])
with open('csv1.csv', encoding="utf8") as check_file:
# Get Column that will be used to add to a row values and add it to a list
columnToAdd = list([row.split(',')[2].strip() for row in check_file])
with open('csv2.csv', 'r', encoding="utf8") as in_file, open('out.csv', 'w', encoding="utf8") as out_file:
i = 0
# For each Row in CSV2
for line in in_file:
# Write Headers
if i == 0:
out_file.write(line)
else:
# GET Column on CSV2 containing value that will be compared on CVS1
value = line.split(',')[1].strip()
# Check if first Column value on CSV2 either variable is in
if value in columnToCompare:
# Check for duplicates in the list
numberOfOccurences = list(columnToCompare).count(value)
concatRow = ""
if numberOfOccurences > 1:
# Concatenate all values of occurences
for x in range(numberOfOccurences):
index = list(columnToCompare).index(value)
concatRow = concatRow + columnToAdd[index]
if x != numberOfOccurences - 1:
concatRow = concatRow + " + "
# Remove value so list.index doesn't found same row
columnToCompare[index] = ""
else:
# Add other row that doesn't match
index = list(columnToCompare).index(value)
concatRow = columnToAdd[index]
# Concat to last column of CSV2
out_file.write(line.strip() + concatRow + "\n")
else:
# Still concat value in CSV2 to last column if not found in csv1
out_file.write(line.strip() + "not found" + "\n")
i = i + 1
I know it can be improved, and minified perhaps using some libs... Let me know what you think!
I tried using pd merge, but I didn't understand quite well how I could add concatenation and values inside it.