0

I am currently working with two csv files, base.csv and another csv file, output_20170503.csv which will be produced everyday, so my aim here is to rebase every output so that they have the same data as the base.csv.

My base.csv:

ID,Name,Number,Shape,Sound
1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quack

My output_20170503.csv

ID,Name,Number,Shape,Sound
1,John,,Round,Meow
2,Jimmy,,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,,Triangle,
5,Nyancat,,Round,Quack
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chirp

The objective here is to rebase the data (ID from 1-5) from base.csv with the output_20170503.csv

What I want to achieve:

ID,Name,Number,Shape,Sound
1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quack
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chirp

I already searched for the solution but what I got;

Merge two csv files (both of csv files have different columns, won't work for me)

Remove duplicates from a csv files (Appending base.csv with the output_20170503.csv and then remove the duplicates, won't work because they have different values for column Number)

Any help would be appreciated, thank you.

Community
  • 1
  • 1
yunaranyancat
  • 131
  • 1
  • 1
  • 13

2 Answers2

1

You can try this, I use first two item as key and generate a dict and then iterate the new dict update the base dict if the key not in base:

new = {"".join(i.split(',')[:2]): i[:-1].split(',') for i in open('output_20170503.csv')}
base = {"".join(i.split(',')[:2]): i[:-1].split(',') for i in open('base.csv')}

base.update({i: new[i] for i in new if i not in base})
f=open("out.csv","w")
for i in sorted(base.values(), key=lambda x: x[0]):
    if i[0]!="ID":
        f.write(",".join(i)+"\n")

Output:

1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quac
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chir

Python2.7+ supports the syntactical extension called the "dictionary comprehension" or "dict comprehension", so if you're using Python2.6, you need to replace the first three lines with:

new = dict(("".join(i.split(',')[:2]),i[:-1].split(',')) for i in open('output_20170503.csv'))
base = dict(("".join(i.split(',')[:2]),i[:-1].split(',')) for i in open('base.csv'))

base.update(dict((i,new[i]) for i in new if i not in base))
McGrady
  • 10,869
  • 13
  • 47
  • 69
  • I got syntax error on the for loop , btw, I'm using python v2.6.. `new = {"".join(i.split(';')[:1]): i[:-1].split(';') for i in open(new_devicedb)}` – yunaranyancat May 04 '17 at 01:02
  • Thank you, but the base update also has Syntax Error at the for loop. `base.update({i: new[i] for i in new if i not in base})` Do I need to import any library? – yunaranyancat May 04 '17 at 01:19
  • @yunaranyancat Update. – McGrady May 04 '17 at 01:38
  • By changing the ',' to ';', would it be the same if the values were semicolon delimited? – yunaranyancat May 04 '17 at 05:39
  • @yunaranyancat sure – McGrady May 04 '17 at 05:41
  • Last question, if I want to use only the first column of the file, I need to change from `new = {"".join(i.split(',')[:2]): i[:-1].split(',') for i in open('output_20170503.csv')}` to `new = {"".join(i.split(',')[:1]): i[:-1].split(',') for i in open('output_20170503.csv')}` ? – yunaranyancat May 04 '17 at 06:30
  • Actually, `"".join(i.split(',')[:2])` is just the key of the dict, I make this dict to update the `base.csv` . The code iterates the file , `i[:-1].split(',')` split the line to a list (like `['1','John','45','Round','Meow']`). You can get the slicing of list see [this question](http://stackoverflow.com/questions/509211/explain-pythons-slice-notation). You need to read the Python docs to learn more. – McGrady May 04 '17 at 06:44
0

You should try to use pandas library which is excellent for data manipulation. you can read easily csv files and do merge operation. Your solution might look like the following :

import pandas as pd

base_df = pd.read_csv('base.csv')
output_df = pd.read_csv('My output_20170503.csv')

output_df.update(base_df)

output_df.write_csv('My output_20170503.csv')

The missing values on output_df has now been updated with the one from base_df.

MathiasDesch
  • 352
  • 3
  • 15