0

I have 2 dfs

df_1

Nº.do Incidente Status  Description Per_Extracao
0   IN6948271   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_SASG_GD9822...    DE : 2022/01/05 ATÉ : 2022/12/08
1   IN6948304   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_AACE_R4539 ...    DE : 2022/01/05 ATÉ : 2022/12/08
2   IN6948307   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_ADAT_SPRK_EX...   DE : 2022/01/05 ATÉ : 2022/12/08
3   IN6948309   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_ADAT_SPRK_EX...   DE : 2022/01/05 ATÉ : 2022/12/08
4   IN6948310   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_ADAT_SPRK_EX...   DE : 2022/01/05 ATÉ : 2022/12/08
5   IN6948311   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_ADAT_SPRK_EX...   DE : 2022/01/05 ATÉ : 2022/12/08

df_2

    JOB_NAME    JOB_STREAM_NAME
0   PP_AACD_NR_D8706_TIHIBRIDA_PROC_EXCUC_D P26_AACD_FAC_TOD
1   PP_SASG_GD9822  P26_AACE_U08
2   PP_AACE_R4539   P26_AACE_U09
3   PP_AACE_R4539_CONS_JUNC P26_AACE_U08
4   PP_AACE_R4539_FMRC_TD_01    P26_AACE_U08
5   PP_AACE_R4539_FMRC_TD_02    P26_AACE_U08

I'm trying to merge then based on the value of JOB_NAME in df_2 founded in df_1(Description columns), it's a kind of a left join on sql, because I need everything from df_2

the output should be something like this

merged_df

Nº.do Incidente Status  Description Per_Extracao JOB_NAME JOB_STREAM_NAME
0   IN6948271   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_SASG_GD9822...    DE : 2022/01/05 ATÉ : 2022/12/08 PP_SASG_GD9822  P26_AACE_U08
1   IN6948304   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_AACE_R4539 ...    DE : 2022/01/05 ATÉ : 2022/12/08 PP_AACE_R4539 P26_AACE_U09
2   NaN   NaN   NaN   NaN PP_AACD_NR_D8706_TIHIBRIDA_PROC_EXCUC_D   P26_AACD_FAC_TOD
3   NaN   NaN   NaN   NaN PP_AACE_R4539_CONS_JUNC P26_AACE_U08
4   NaN   NaN   NaN   NaN PP_AACE_R4539_FMRC_TD_01 P26_AACE_U08 
5   NaN   NaN   NaN   NaN PP_AACE_R4539_FMRC_TD_02 P26_AACE_U08

tried to use some merge commands, but, none has worked, for example:

df_test = df_2.merge(df_2,left_on='JOB_NAME',right_on=df_1['Description'].str.extract(df_2['JOB_NAME']),how='left')

and

df_test = df_2.merge(df_2,left_on='JOB_NAME',right_on=df_1['Description'].str.contains(df_2['JOB_NAME']),how='left')
gfernandes
  • 193
  • 1
  • 10
  • I already posted you the solution for this few days ago, is there something wrong with it or why do you ask the same question again? – Rabinzel Aug 25 '22 at 15:12
  • Does this answer your question? [merging df in pandas based on "contains" values](https://stackoverflow.com/questions/73450488/merging-df-in-pandas-based-on-contains-values) – Emi OB Aug 25 '22 at 15:15
  • hey man, I aprecciate your help there, but, this time, it's not the same thing, if you have a look in this post, I'm trying a "left" join, not a "inner" join...would you be able to help me again on this? – gfernandes Aug 25 '22 at 15:19
  • oh, then I was a little fast on this, sorry. It just looked exactly the same :D. I will have another look – Rabinzel Aug 25 '22 at 15:23
  • @Rabinzel because it's the same data...:), but a different situation\resolution that I'm looking for – gfernandes Aug 25 '22 at 15:25

1 Answers1

1

You can use the same answer as in the other question, just add the keyword right to it on how the merge should be done.

pat = "|".join(df2['JOB_NAME'])
df1.insert(0, 'merge_key', df1['Description'].str.extract("("+pat+")", expand=False))

df = df1.merge(df2, how='right', left_on='merge_key', right_on='JOB_NAME').drop('merge_key', axis=1)
print(df)
       Nº.do Incidente Status                                  Description  jobName                      Per_Extracao                                 JOB_NAME   JOB_STREAM_NAME
0        NaN              NaN                                          NaN      NaN                               NaN  PP_AACD_NR_D8706_TIHIBRIDA_PROC_EXCUC_D  P26_AACD_FAC_TOD
1  ENCERRADO             GR26  D.I.T.I. >>> ABEND NO JOB PP_SASG_GD9822...      NaN  DE : 2022/01/05 ATÉ : 2022/12/08                           PP_SASG_GD9822      P26_AACE_U08
2  ENCERRADO             GR26  D.I.T.I. >>> ABEND NO JOB PP_AACE_R4539 ...      NaN  DE : 2022/01/05 ATÉ : 2022/12/08                            PP_AACE_R4539      P26_AACE_U09
3        NaN              NaN                                          NaN      NaN                               NaN                  PP_AACE_R4539_CONS_JUNC      P26_AACE_U08
4        NaN              NaN                                          NaN      NaN                               NaN                 PP_AACE_R4539_FMRC_TD_01      P26_AACE_U08
5        NaN              NaN                                          NaN      NaN                               NaN                 PP_AACE_R4539_FMRC_TD_02      P26_AACE_U08

Rabinzel
  • 7,757
  • 3
  • 10
  • 30
  • the only thing @Rabinzel in this case is this: df1.insert(0, 'merge_key', df1['Description'].str.extract("("+pat+")", expand=False)) If I have a value in "pat" that doesnt have a match in the "Description" column, it won't bring that value to the merged df. and I need everything from pat not from the match in that piece of code. – gfernandes Aug 25 '22 at 18:19
  • thats not true, see your desired output and mine, it is the same. When we insert the column, everywhere where no match was found will be a `NaN`. Then we merge df1 with df2 but this time we use `how='right'`, which means every row of df2 will be in the result, no matter if there is a match or not. Have a look at [merge101](https://stackoverflow.com/questions/53645882/pandas-merging-101), and see what it means to have left, right, inner or outer merge. – Rabinzel Aug 25 '22 at 18:28
  • yes, because there it's just a sample of my dataset, let me give a sample: if we have in pat the joName = AAAAAA and we don't have any value in the Description column with this, that means that the jobName AAAAAA won't be inserted in the df1 because there weren't match, right? – gfernandes Aug 25 '22 at 18:29
  • If you merge df1 with df2 and do a right merge, no matter what matches or not, every row of df2 will be in the result. if you do a left merge, every row of df1 will be in the result. in an inner merge, only the rows gets displayed were df1 and df2 have a match. – Rabinzel Aug 25 '22 at 18:31
  • if we have a jobname with `AAAA` and we don't have it in the description, then this row will still be in the final output, just as `NaN` – Rabinzel Aug 25 '22 at 18:33
  • considering you last comment, in the final output the jobName value would be NaN for the jobName AAAA or the rest of the row? sorry, didn't get this part – gfernandes Aug 25 '22 at 18:34
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247566/discussion-between-gfernandes-and-rabinzel). – gfernandes Aug 25 '22 at 18:35