0

I'm new to Python and could use some help.

Say I have a dataset that looks like this:

      Serial Number        Source
0          AB100          Donatelle
1          AB200          Qure
2          AB100          Donatelle
3          AB200          Qure
4          AB100          Grand Avenue
5          AB200          Eagle Services
6          AB300          Donatelle
7          AB300          Donatelle
8          AB100          Qure
9          AB100          Eagle Services

And I need to add a column to it, like this:

       Serial Number        Source         SN Data Sources
0          AB100          Donatelle        Donatelle, Grand Avenue, Qure, Eagle Services
1          AB200          Qure             Qure, Eagle Services
2          AB100          Donatelle        Donatelle, Grand Avenue, Qure, Eagle Services
3          AB200          Qure             Qure, Eagle Services
4          AB100          Grand Avenue     Donatelle, Grand Avenue, Qure, Eagle Services
5          AB200          Eagle Services   Qure, Eagle Services
6          AB300          Donatelle        Donatelle
7          AB300          Donatelle        Donatelle
8          AB100          Qure             Donatelle, Grand Avenue, Qure, Eagle Services
9          AB100          Eagle Services   Donatelle, Grand Avenue, Qure, Eagle Services

My knowledge is still limited, pardon me.

I'm working with a 40k rows dataframe and I need to generate a column that brings all different sources that are present in the dataframe for the serial number in each row.

Can anybody help me out? Thanks and kind regards

W.Coyote
  • 25
  • 4

2 Answers2

0

You can groupby on "Serial Number" column and apply a list to "Source" column.

Next create a dictionary of the grouped df and convert it to a df.

Finally merge the dfs together and clean up the columns.

data = {
    "Serial Number": ["AB100", "AB200", "AB100", "AB200", "AB100", "AB200"],
    "Source": ["Donatelle", "Qure", "Grand Avenue", "Eagle Services", "Qure", "Grand Avenue"]
}
df = pd.DataFrame(data)

grouped_df = df.groupby("Serial Number")["Source"].apply(list).reset_index()
mapping = grouped_df.set_index("Serial Number")["Source"].to_dict()
mapping_df = pd.DataFrame.from_dict(mapping, orient="index").unstack().reset_index()

final_df = pd.merge(
    grouped_df,
    mapping_df,
    left_on="Serial Number",
    right_on="level_1"
).rename(columns={0: "Source", "Source": "SN Data Sources"}[["Serial Number", "Source", "SN Data Sources"]]

print(final_df)

  Serial Number          Source                       SN Data Sources
0         AB100       Donatelle       [Donatelle, Grand Avenue, Qure]
1         AB100    Grand Avenue       [Donatelle, Grand Avenue, Qure]
2         AB100            Qure       [Donatelle, Grand Avenue, Qure]
3         AB200            Qure  [Qure, Eagle Services, Grand Avenue]
4         AB200  Eagle Services  [Qure, Eagle Services, Grand Avenue]
5         AB200    Grand Avenue  [Qure, Eagle Services, Grand Avenue]
Jason Baker
  • 3,170
  • 2
  • 12
  • 15
0

Using groupby() and agg() with value_counts(), tnen merge(). Preserves the order of Serial Number and the order of occurrence of elements in Source from top to bottom:

df = pd.DataFrame(
    {'Serial Number': ['AB100', 'AB200', 'AB100', 'AB200', 'AB100', 'AB200', 'AB300', 'AB300', 'AB100', 'AB100'],
     'Source': ['Donatelle', 'Qure', 'Donatelle', 'Qure', 'Grand Avenue', 'Eagle Services', 'Donatelle', 'Donatelle',
                'Qure', 'Eagle Services']})

df = df.merge(df.groupby('Serial Number').agg(lambda x: ', '.join(x.value_counts().keys())),
              how='left', on='Serial Number', suffixes=('', '2')).rename(columns={'Source2': 'SN Data Sources'})
print(df)

Prints:

  Serial Number          Source                                SN Data Sources
0         AB100       Donatelle  Donatelle, Grand Avenue, Qure, Eagle Services
1         AB200            Qure                           Qure, Eagle Services
2         AB100       Donatelle  Donatelle, Grand Avenue, Qure, Eagle Services
3         AB200            Qure                           Qure, Eagle Services
4         AB100    Grand Avenue  Donatelle, Grand Avenue, Qure, Eagle Services
5         AB200  Eagle Services                           Qure, Eagle Services
6         AB300       Donatelle                                      Donatelle
7         AB300       Donatelle                                      Donatelle
8         AB100            Qure  Donatelle, Grand Avenue, Qure, Eagle Services
9         AB100  Eagle Services  Donatelle, Grand Avenue, Qure, Eagle Services
Алексей Р
  • 7,507
  • 2
  • 7
  • 18