0

I am trying to create datasets from the name of the columns of a dataframe. Where I have the columns ['NAME1', 'EMAIL1', 'NAME2', 'EMAIL2', NAME3', 'EMAIL3', etc].

I'm trying to split the dataframe based on the 'EMAIL' column, where through a loop, but it's not working properly.

I need it to be a JSON, because there is the possibility that between each 'EMAILn' column there may be a difference in number of columns.

This is my input: enter image description here

I need this:

enter image description here

This is my code:

for i in df_entities.filter(regex=('^(EMAIL)' + str(i))).columns: 
    df_groups = df_temp_1.groupby(i)
    df_detail = df_groups.get_group(i)
    display(df_detail)

What do you recommend me to do?

From already thank you very much.

Regards

Gonza
  • 155
  • 2
  • 10
  • Please provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) – BeRT2me Jun 03 '22 at 20:48

2 Answers2

1

filter returns a copy of your dataframe with only the matching columns, but you're trying to loop over just the column names. Just add .columns:

for i in df_entities.filter(regex=('^(Email)' + str(i))).columns: 
    ...                                               # ^^^^^^^^^ important
  • Ok, I understand you, but I keep getting the same error ('KeyError: 'Email18''). Indeed, the Email18 column exists, but its value is not 'Email18', apparently that is the problem (I must write the cut data, not the column name), BUT, I need to cut the dataframe based on the column name. . – Gonza Jun 03 '22 at 21:09
1

From your input and desired output, simply call pandas.wide_to_long:

long_df = pd.wide_to_long(
    df_entities.reset_index(), 
    stubnames=["NAME", "EMAIL"],
    i="index",
    j="version"
)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks you for you comment Parfait, I dont know this function, but is possible that exist more column (I dont how many cases), by this razon I need a json format. Thanks – Gonza Jun 04 '22 at 01:33
  • Again, I only went by your input and output. You can add more columns to `i` argument. I do not understand what you mean *json format*. Please provide more sample data (not images) using a [pandas reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Parfait Jun 04 '22 at 01:41