0

enter image description here

There are 2 files opened with Pandas. If there are common parts in the first column of two files (colored letters), I want to paste the data of the second column of second file into the matched part of the first file. And if there is no match, I want to write 'NaN'. Is there a way I can do in this situation?

File1

enter code here
    0    1
0  JCW  574
1  MBM  4212
2  COP  7424
3  KVI  4242
4  ECX  424

File2

enter code here
     0      1
0  G=COP  d4ssd5vwe2e2
1  G=DDD  dfd23e1rv515j5o
2  G=FEW  cwdsuve615cdldl
3  G=JCW  io55i5i55j8rrrg5f3r
4  G=RRR  c84sdw5e5vwldk455
5  G=ECX  j4ut84mnh54t65y

File1#

enter code here
    0    1     2
0  JCW  574   io55i5i55j8rrrg5f3r
1  MBM  4212  NaN
2  COP  7424  d4ssd5vwe2e2
3  KVI  4242  NaN
4  ECX  424   j4ut84mnh54t65y
LoganLee
  • 145
  • 1
  • 10

2 Answers2

1

Use Series.str.extract for new Series for matched values by df1[0] values first and then merge with left join in DataFrame.merge:

df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)

s = df2[0].str.extract(f'({"|".join(df1[0])})', expand=False)

df = df1.merge(df2[[1]], how='left', left_on=0, right_on=s)
df.columns = np.arange(len(df.columns))
print (df)
     0     1                    2
0  JCW   574  io55i5i55j8rrrg5f3r
1  MBM  4212                  NaN
2  COP  7424         d4ssd5vwe2e2
3  KVI  4242                  NaN
4  ECX   424      j4ut84mnh54t65y

Or if need match last 3 values of column df1[0] use:

s = df2[0].str.extract(f'({"|".join(df1[0].str[-3:])})', expand=False)

df = df1.merge(df2[[1]], how='left', left_on=0, right_on=s)
df.columns = np.arange(len(df.columns))
print (df)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for your comment. However, I got other issue with your suggestion. Results JCW,"574","io55i5i55j8rrrg5f3r"\n MBM,"4212","NaN"\n COP,"7424","d4ssd5vwe2e2"\n KVI,"4242","NaN"\n ECX,"424","j4ut84mnh54t65y"\n I got the result like this with "" in 2nd, 3rd columns. Why does this situation happen and How I can do remove the "" in this file? – LoganLee Nov 12 '21 at 05:47
  • @LoganLee - Is possible change data sample for see problem? Because not easy find it from comments. Thansk. – jezrael Nov 12 '21 at 06:18
  • I really appreciate you. Unfortunately, I don't want to leave my company by sharing the original file with you cuz my boss doesn't want the file to be exported. Thank you for your consideration and hope everything goes well to you. – LoganLee Nov 12 '21 at 07:13
  • @LoganLee - I understand. I think change data sample from commnets to question, still sample data, but data which not working for yuou. – jezrael Nov 12 '21 at 07:14
0

Have a look at the concat-function of pandas using join='outer' (https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html). There is also this question and the answer to it that can help you.

It involves reindexing each of your data frames to use the column that is now called "0" as the index, and then joining two data frames based on their indices.

Also, can I suggest that you do not paste an image of your dataframes, but upload the data in a form that other people can test their suggestions.

eandklahn
  • 537
  • 4
  • 8