I have a program that reads a .csv file, checks for any mismatch in column length (by comparing it to the header-fields), which then returns everything it found out as a list (and then writes it into a file). What I want to do with this list, is to list out the results as follows:
row numbers where the same mismatch is found : the amount of columns in that row
e.g.
rows: n-m : y
where n and m are the numbers of rows which share the same amount of columns that mismatch to header.
I have looked into these topics, and while the information is useful, they do not answer the question:
Find and list duplicates in a list?
Identify duplicate values in a list in Python
This is where I am right now:
r = csv.reader(data, delimiter= '\t')
columns = []
for row in r:
# adds column length to a list
colm = len(row)
columns.append(colm)
b = len(columns)
for a in range(b):
# checks if the current member matches the header length of columns
if columns[a] != columns[0]:
# if it doesnt, write the row and the amount of columns in that row to a file
file.write("row " + str(a + 1) + ": " + str(columns[a]) + " \n")
the file output looks like this:
row 7220: 0
row 7221: 0
row 7222: 0
row 7223: 0
row 7224: 0
row 7225: 1
row 7226: 1
when the desired end result is
rows 7220 - 7224 : 0
rows 7225 - 7226 : 1
So I what I essentially need, the way i see it, is an dictionary where key is the rows with duplicate value and value is the amount of columns in that said mismatch. What I essentially think I need (in a horrible written pseudocode, that doesn't make any sense now that I'm reading it years after writing this question), is here:
def pseudoList():
i = 1
ListOfLists = []
while (i < len(originalList)):
duplicateList = []
if originalList[i] == originalList[i-1]:
duplicateList.append(originalList[i])
i += 1
ListOfLists.append(duplicateList)
def PseudocreateDict(ListOfLists):
pseudoDict = {}
for x in ListOfLists:
a = ListOfLists[x][0] #this is the first node in the uniqueList created
i = len(ListOfLists) - 1
b = listOfLists[x][i] #this is the last node of the uniqueList created
pseudodict.update('key' : '{} - {}'.format(a,b))
This however, seems very convoluted way for doing what I want, so I was wondering if there's a) more efficient way b) an easier way to do this?