To get the multiple items with individual strings, you should amend your first line of code to aggregate the multiple items into a list of strings instead of joining the items into a single string.
You got multiple items concatenated into a single string e.g. 'item1, item2, item3'
(with only one and only one pair of single quotes at both ends but not around each item) instead of distinct strings e.g. 'item1'
,'item2'
,'item3'
because you joined the individual strings into one string for each group of IP
by using ','.join(....)
within the .apply()
function.
Amend your first line of code as follows:
d3 = df2.groupby('IP')['URL'].apply(lambda x: x.dropna().unique().tolist()).reset_index()
You can also simplify your codes of extracting the list of string with looping by replacing the loop with as simple as one line, as follows:
dataset = d3['URL'].tolist()
Demo
Input data:
import numpy as np
data = {'IP': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3],
'URL': ['item1', 'item2', 'item3', 'item3', 'item4', np.nan, 'item4', 'item5', 'item5', np.nan, 'item6']}
df = pd.DataFrame(data)
print(df)
IP URL
0 1 item1
1 1 item2
2 1 item3
3 1 item3
4 2 item4
5 2 NaN
6 2 item4
7 3 item5
8 3 item5
9 3 NaN
10 3 item6
Output
print(dataset)
[['item1', 'item2', 'item3'], ['item4'], ['item5', 'item6']]