0

I have the following df:

df = pd.DataFrame([["fileA", "Users;user;Downloads;folder1"], ["fileA", "Users;user;Downloads;folder2"], ["fileA", "Users;user;Downloads;folder1"], ["fileB", "Users;user;Documents;folder1"], ["fileB", "Users;user;Documents;folder1"], ["fileB", "Users;user;Downloads;folder1"]], columns=['file', 'path'])

and for every unique value file, I want to make a list of strings of all corresponding values in path.

fileA   ["Users;user;Downloads;folder1", "Users;user;Downloads;folder2", "Users;user;Downloads;folder1"]
fileB   ["Users;user;Documents;folder1","Users;user;Documents;folder1, "Users;user;Downloads;folder1"]

The final aim is to apply the following function to each row in column path:

from itertools import takewhile
def allnamesequal(name):
    return all(n==name[0] for n in name[1:])
def commonprefix(paths, sep=';'):
    bydirectorylevels = zip(*[p.split(sep) for p in paths])
    return sep.join(x[0] for x in takewhile(allnamesequal, bydirectorylevels))
Saraha
  • 144
  • 1
  • 12
  • 1
    Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) – sushanth Nov 09 '20 at 10:06

1 Answers1

0

You can use groupby to join all strings of the groups
And after that, filter only the unique values in each row

df = pd.DataFrame([["fileA", "Users;user;Downloads;folder1"],
                   ["fileA", "Users;user;Downloads;folder2"],
                   ["fileA", "Users;user;Downloads;folder1"],
                   ["fileB", "Users;user;Documents;folder1"],
                   ["fileB", "Users;user;Documents;folder1"],
                   ["fileB", "Users;user;Downloads;folder1"]],
                   columns=['file', 'path'])


grouped = df.groupby('file').transform(lambda x: ','.join(x)).drop_duplicates()
grouped['unique'] = grouped.apply(lambda x:  set(re.split(r"[;,]", x['path'])), axis=1)
print(grouped)

This will result the following dataframe:

   path                                              unique
0  Users;user;Downloads;folder1,Users;user;Downlo...  {folder1, user, Downloads, Users, folder2}
3  Users;user;Documents;folder1,Users;user;Docume...  {Documents, folder1, user, Downloads, Users}
Gal Perelman
  • 47
  • 1
  • 4