0

I have a dataset that has multiple location and the number of visits by week. I want to filter out the locations that did not have any visitors in the most recent week.

I have tried the following:

df_select = df.loc[(df['end_date'] == "3/17/2023") & (df['visits'] != 0)]
df_group = df.groupby(['id','name','lat', 'lng']).agg({'visits': ['count']})

When I try this I get the following error: “ValueError: cannot reindex from a duplicate axis”

James
  • 32,991
  • 4
  • 47
  • 70

1 Answers1

0

One of the filtering columns in your df_select assignment (namely end_date or visits) is likely a duplicate column label. If you can somehow make these column labels unique, the problem should be solved.

Similar issues are discussed here.

This phenomenon can be illustrated by this short example:

df = pd.DataFrame({i:list(range(3)) for i in range(10,14)})
df_select = df.loc[(df[11] == 1) & (df[12] != 0)]

Input:

   10  11  12  13
0   0   0   0   0
1   1   1   1   1
2   2   2   2   2

Output:

   10  11  12  13
1   1   1   1   1

Now, here it is with a duplicate column label 12:

df = pd.DataFrame({i:list(range(3)) for i in range(10,14)})
df.columns=[min(col, 12) for col in df.columns]
df_select = df.loc[(df[11] == 1) & (df[12] != 0)]

Input:

   10  11  12  12
0   0   0   0   0
1   1   1   1   1
2   2   2   2   2

Output:

...

ValueError: cannot reindex on an axis with duplicate labels
constantstranger
  • 9,176
  • 2
  • 5
  • 19