-2

I have a CSV file with hundreds of rows like shown below! How can I merge these two rows in such a way that only second column value gets updated to CVE-2017-3006, CVE-2017-3007 and the rest remains the same

enter image description here

Zaid Waseem
  • 341
  • 1
  • 4
  • 13
  • 2
    Does this answer your question? [pandas - Merge nearly duplicate rows based on column value](https://stackoverflow.com/questions/36271413/pandas-merge-nearly-duplicate-rows-based-on-column-value) – vahdet Feb 07 '20 at 14:13

1 Answers1

1
import pandas as pd

# Create a dummy dataframe like yours
df = pd.DataFrame([
    {"C1": "0", "C2": "A", "C3": "9.0", "C4": "High", "C5": "zaid", "C6": "TCP", "C7": "445", "C8": "some_text", "C9": "some_other_text"},
    {"C1": "0", "C2": "B", "C3": "9.0", "C4": "High", "C5": "zaid", "C6": "TCP", "C7": "445", "C8": "some_text", "C9": "some_other_text"},
    {"C1": "1", "C2": "A", "C3": "17.0", "C4": "High", "C5": "zaid", "C6": "TCP", "C7": "445", "C8": "some_text", "C9": "some_other_text"},
    {"C1": "1", "C2": "B", "C3": "17.0", "C4": "High", "C5": "zaid", "C6": "TCP", "C7": "445", "C8": "some_text", "C9": "some_other_text"},
    {"C1": "1", "C2": "C", "C3": "17.0", "C4": "High", "C5": "zaid", "C6": "TCP", "C7": "445", "C8": "some_text", "C9": "some_other_text"},
])

# Group by all columns you want to keep, and aggregate C2 into a list
grouped = df.groupby(["C1","C3","C4","C5","C6","C7","C8","C9"], as_index=False).agg({
    "C2":lambda x:x.tolist()
})
print(grouped)

      C1    C3    C4    C5   C6   C7         C8               C9         C2
0  0   9.0  High  zaid  TCP  445  some_text  some_other_text     [A, B]
1  1  17.0  High  zaid  TCP  445  some_text  some_other_text  [A, B, C]
tdpr
  • 212
  • 1
  • 4