0

Hey all, I have two databases. One with 145000 rows and approx. 12 columns. I have another database with around 40000 rows and 5 columns. I am trying to compare based on two columns values. For example if in CSV#1 column 1 says 100-199 and column two says Main St(meaning that this row is contained within the 100 block of main street), how would I go about comparing that with a similar two columns in CSV#2. I need to compare every row in CSV#1 to each single row in CSV#2. If there is a match I need to append the 5 columns of each matching row to the end of the row of CSV#2. Thus CSV#2's number of columns will grow significantly and have repeat entries, doesnt matter how the columns are ordered. Any advice on how to compare two columns with another two columns in a separate database and then iterate across all rows. I've been using python and the import csv so far with the rest of the work, but this part of the problem has me stumped.

Thanks in advance -John

John_U262D
  • 31
  • 1
  • 4
  • sorry I meant to say: If there is a match I need to append the ####12#### columns of each matching row to the end of the row of CSV#2 – John_U262D Feb 17 '11 at 16:07
  • Do not comment on your own question. It's **your** question. Please **update** your question and remove the comment. Also, please consider using paragraphs to break the big block of text into separate sections so we can read it. – S.Lott Feb 17 '11 at 16:09
  • 1
    Have you considered using a database instead of CSV files? This would make this really easy. – Sven Marnach Feb 17 '11 at 16:12

1 Answers1

2
  1. A csv file is NOT a database. A csv file is just rows of text-chunks; a proper database (like PostgreSQL or Mysql or SQL Server or SQLite or many others) gives you proper data types and table joins and indexes and row iteration and proper handling of multiple matches and many other things which you really don't want to rewrite from scratch.

  2. How is it supposed to know that Address("100-199")==Address("Main Street")? You will have to come up with some sort of knowledge-base which transforms each bit of text into a canonical address or address-range which you can then compare; see Where is a good Address Parser but be aware that it deals with singular addresses (not address ranges).

Edit:

Thanks to Sven; if you were using a real database, you could do something like

SELECT
    User.firstname, User.lastname, User.account, Order.placed, Order.fulfilled
FROM
    User
    INNER JOIN Order ON
        User.streetnumber=Order.streetnumber
        AND User.streetname=Order.streetname

if streetnumber and streetname are exact matches; otherwise you still need to consider point #2 above.

Community
  • 1
  • 1
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
  • 1
    Regarding 2, the OP wants to use two columns as key for the join. The other "database" would also contain a row with exactly the same two values. – Sven Marnach Feb 17 '11 at 17:24
  • ok thanks I haven't used SQL before, but I'm reading up on how to import a csv into it then I'll use that sort of query. I also just decided to create a new column with "100-199 Main Street" which should simplify it further. – John_U262D Feb 17 '11 at 19:04