This refers to the solution stated in Python: Comparing two CSV files and searching for similar items
I want to know what does syntax
masterlist = [row for row in c2]
mean in the solution.
This refers to the solution stated in Python: Comparing two CSV files and searching for similar items
I want to know what does syntax
masterlist = [row for row in c2]
mean in the solution.
It is a list comprehension, but in this particular case a useless one. It's more efficient to just do:
masterlist = list(c2)
as that would also copy all elements from the c2
iterable into a new list object.
A list comprehension would normally be used to filter or transform elements from an iterable; say to pick only certain rows, or only a specific column from each row:
masterlist = [row[1] for row in c2]
would pick out just the one column the referenced answer actually uses in their code.
The accepted answer there is missing out on several Python idioms and best practices and operates in quadratic time (duration to process is a function of the size of hosts.csv
times the size of masterlist.csv
); it should be rewritten to:
import csv
with open('hosts.csv', 'rb') as hosts, open('masterlist.csv', 'rb') as master:
with open('results.csv', 'wb') as results:
reader = csv.reader(hosts)
writer = csv.writer(results)
master_indices = {r[1]: i for i, r in enumerate(csv.reader(master), 1)}
for row in reader:
index = master_indices.get(row[3])
if index is not None:
message = 'FOUND in master list (row {})'.format(index)
else:
message = 'NOT FOUND in master list'
writer.writerow(row)
Here I replaced the list comprehension with a dictionary comprehension instead; creating a dictionary mapping the second column of each CSV row to a row number (produced with enumerate()
); now the code to test if row[3]
from the hosts.csv
file is present in that column is reduced to just a dict.get()
call, mapping directly to the row number.
Usually, list comprehensions are a quick way to map one list to another with a transforming expression. It's like calling map() but with syntactic sugar.
e.g.
list_ = [1, 2, 3, 4, 5] #or range(1, 6)
squares = [x*x for x in list_]
print squares
#[1, 4, 9, 16, 25]