0

I'm new to python and dataframes so I am wondering if someone knows how I could accomplish the following. I have a dataframe with many columns, some which share a beginning and have an underscore followed by a number (bird_1, bird_2, bird_3). I want to essentially merge all of the columns that share a beginning into singular columns with all the values that were contained in the constituent columns. Then I'd like to run df[columns].value_counts for each.

Initial dataframe

Initial df

Final dataframe

Final df

For df[bird].value_counts(), I would get a count of 1 for A-L
For df[cat].value_counts(), I would get a count of 3 for A, 4 for B, 1 for C

The ultimate goal is to get a count of unique values for each column type (bird, cat, dog, etc.)

dejanualex
  • 3,872
  • 6
  • 22
  • 37

1 Answers1

0

You can do:

df.columns=[col.split("_")[0] for col in df.columns]

df=df.unstack().reset_index(1, drop=True).reset_index()
df["id"]=df.groupby("index").cumcount()
df=df.pivot(index="id", values=0, columns="index")

Outputs:

index bird  cat
id
0        A    A
1        B    A
2        C    A
3        D    B
4        E    B
5        F    B
6        G    B
7        H    C
8        I  NaN
9        J  NaN
10       K  NaN
11       L  NaN

From there to get counts of all possible values:

df.T.stack().reset_index(1, drop=True).reset_index().groupby(["index", 0]).size()

Outputs:

index  0
bird   A    1
       B    1
       C    1
       D    1
       E    1
       F    1
       G    1
       H    1
       I    1
       J    1
       K    1
       L    1
cat    A    3
       B    4
       C    1
dtype: int64
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34