0

Working with python 3.7 and pandas. I have two dataframes, one has information on a list of samples, 30000 rows and a log of the error recorded along with some information about the sample. The second dataframe is a dictionary of error number, contains more information about that error. There are 5000 rows The first dataframe has the following format:


  Sample_ID Error_ID   Material    Mass (kg)  Length (cm)
     1       0004       steel        54          132
     2       0002       wood         66          103
     3       0003       aluminium    51          166
     4       0002       plastic      35          144
     5       0004       aluminium    41          156
     6       0002       granite      200         199

The second dataframe:

Error_ID       Info
  0001         error 1
  0002         error 2
  0003         error 3
  0004         error 4

I want to know if there is a straightforward way of creating a dataframe which is the same as the first but then appends the relevant error information to each row. Which doesn't require a loop through every element of dataframe 1.

Desired output:

  Sample_ID Error_ID   Material    Mass (kg)  Length (cm)  Error_ID  Info
     1       0004       steel        54          132        0004     error 4
     2       0002       wood         66          103        0002     error 2
     3       0003       aluminium    51          166        0003     error 3
     4       0002       plastic      35          144        0002     error 2
     5       0004       aluminium    41          156        0004     error 4
     6       0002       granite      200         199        0002     error 2

Here is the code to create the first 2 dataframes:

data_1 = [['1', '0004', 'steel', 54, 132], ['2', '0002', 'wood', 66, 103], ['3', '0003', 'aluminium', 51, 166], ['4', '0002', 'plastic', 35, 144], ['5', '0004', 'aluminium', 41, 156], ['6', '0002', 'granite', 200, 199]]
df_1 = pd.DataFrame(data_1, columns=['Sample_ID', 'Error_ID', 'Material', 'Mass (kg)', 'Length (cm)'])

data_2 = [['0001', 'error 1'], ['0002', 'error 2'], ['0003', 'error 3'], ['0004', 'error 4']]
df_2 = pd.DataFrame(data_2, columns=['Error_ID', 'Info'])

Thank you

Agustin
  • 1,458
  • 1
  • 13
  • 30
  • 1
    I think you need merge with left join, if not working create [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve). – jezrael Oct 08 '19 at 09:30
  • @jezrael I have added an example as I cannot figure our how left join works for this. Thanks – Agustin Oct 08 '19 at 09:55

0 Answers0