0

So I have two csv: call the first one 'test1.csv'

+---------+---+---+--+
| user_id | A | B |  |
+---------+---+---+--+
|       1 | a | f |  |
|       2 | b | g |  |
|       3 | c | h |  |
|       4 | d | i |  |
|       5 | e | j |  |
+---------+---+---+--+

the second one is 'test2.csv'

+---------+---+---+--+--+
| user_id | C | D |  |  |
+---------+---+---+--+--+
|       1 | k | r |  |  |
|       2 | l | s |  |  |
|       4 | m | t |  |  |
|       5 | n | u |  |  |
|       6 | o | v |  |  |
|       7 | p | w |  |  |
|       8 | q | x |  |  |
+---------+---+---+--+--+

*note that not all the id in test1.csv are in test2.csv and vice versa and not ordered

desired output:

+---------+---+---+---+-----+
| user_id | A | B | C |   D |
+---------+---+---+---+-----+
|       1 | a | f | k | r   |
|       2 | b | g | l | s   |
|       4 | d | i | m | t   |
|       5 | e | j | n | u   |
+---------+---+---+---+-----+

So in essence, i want to merge the pd on user_id and not track the extraneous id

Any help is appreciated :)

0 Answers0