1

i have two data csv

The first:

v1,v2,v3,....v100
-0.6662942866484324,-1.0799718232204516,1.843649258216222,....1.0950462520122528
0.7452152929104426,-0.6032845087431591,0.7041161138126079,....-0.41362931908053513

The second:

c1,c2,c3,c4,c5
4,1,0,0,1
14,2,2,0,13

when I combine using my code, the results are like this:

v1,v2,v3..v100,c1,c2,c3,c4,c5
0.0,1.0,2,...0,0,0,1,0,0

my code is like this..

import pandas as pd
vector = pd.read_csv('../data/vector_data.csv',encoding = "ISO-8859-1")
cluster= pd.read_csv('../data/data_cluster.csv',encoding = "ISO-8859-1")
data=vector.merge(cluster, left_on='v1', right_on='c1')
export_csv = data.to_csv (r'../data/merge_label.csv',index=False)

the result should be like this

v1,v2,v3..v100,c1,c2,c3,c4,c5
-0.6662942866484324,-1.0799718232204516,1.843649258216222,....1.0950462520122528,4,1,0,0,1

please help me...

3 Answers3

1

Pandas not needed

with open('third.csv', 'w') as fh:
    for f, s in zip(*map(open, ['first.csv', 'second.csv'])):
        fh.write(f.rstrip('\n') + ',' + s)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Why wouldn't you want to use pandas in this use-case? – Umar.H Jul 19 '19 at 20:51
  • 1
    If the application is to just merge csv files, importing pandas might be too much overhead. Imagine this in a command line tool and needed to be run many times. It just isn't necessary to load up a heavy hitting data analytics library to do something this simple. – piRSquared Jul 19 '19 at 20:53
  • Thanks as always, you are a great teacher. – Umar.H Jul 19 '19 at 20:54
  • Now i know what you meant :) – anky Jul 20 '19 at 04:05
0

try updating to this:

data=vector.merge(cluster, left_on='v1', right_on='c1', how='outer')

default is how=inner so looks like the only intersection may be 0 and creating the single row you are seeing.

Connor John
  • 433
  • 2
  • 8
0

Can you try the following code if it works:

data=pd.concat([vector,cluster],axis=1)