I have the following df
:
df = pd.DataFrame([["fileA", "Users;user;Downloads;folder1"], ["fileA", "Users;user;Downloads;folder2"], ["fileA", "Users;user;Downloads;folder1"], ["fileB", "Users;user;Documents;folder1"], ["fileB", "Users;user;Documents;folder1"], ["fileB", "Users;user;Downloads;folder1"]], columns=['file', 'path'])
and for every unique value file
, I want to make a list of strings of all corresponding values in path
.
fileA ["Users;user;Downloads;folder1", "Users;user;Downloads;folder2", "Users;user;Downloads;folder1"]
fileB ["Users;user;Documents;folder1","Users;user;Documents;folder1, "Users;user;Downloads;folder1"]
The final aim is to apply the following function to each row in column path
:
from itertools import takewhile
def allnamesequal(name):
return all(n==name[0] for n in name[1:])
def commonprefix(paths, sep=';'):
bydirectorylevels = zip(*[p.split(sep) for p in paths])
return sep.join(x[0] for x in takewhile(allnamesequal, bydirectorylevels))