1

I want to do a specific action in Python. Indeed, I have a CSV file where many data are specified into columns. What I want is to change them into rows. Nevertheless, I need to respect specific rules.

Here is the dataset format :

[... , 'N°Panneau', 'Sexe', 'Prénom_Nom_Cand1', 'Voix', '% Voix/Ins', '% Voix/Exp',
'col_28', 'col_29', 'Prénom_Nom_Cand2', 'col_32', 'col_33', 'col_34',
'col_35', 'col_36', 'Prénom_Nom_Cand3', 'col_39', 'col_40', 'col_41',
'col_42', 'col_43', 'Prénom_Nom_Cand4', 'col_46', 'col_47', 'col_48',
'col_49', 'col_50', 'Prénom_Nom_Cand5', 'col_53', 'col_54', 'col_55',
'col_56', 'col_57', 'Prénom_Nom_Cand6', 'col_60', 'col_61', 'col_62',
'col_63', 'col_64', 'Prénom_Nom_Cand7', 'col_67', 'col_68', 'col_69',
'col_70', 'col_71', 'Prénom_Nom_Cand8', 'col_74', 'col_75', 'col_76',
'col_77', 'col_78', 'Prénom_Nom_Cand9', 'col_81', 'col_82', 'col_83',
'col_84', 'col_85', 'Prénom_Nom_Cand10','col_88','col_89', 'col_90',
'col_91', 'col_92', 'Prénom_Nom_Cand11','col_95','col_96', 'col_97',
'col_98', 'col_99', 'Prénom_Nom_Cand12', 'col_102','col_103', 'col_104']

Where 'N°Panneau', 'Sexe', 'Prénom_Nom_Cand1', 'Voix', '% Voix/Ins', '% Voix/Exp' is the same columns as 'col_{number}', 'col_{number}', 'Prénom_Nom_Cand{number}', 'col_{number}', 'col_{number}', 'col_{number}'

I want to just have the columns 'N°Panneau', 'Sexe', 'Prénom_Nom_Cand1', 'Voix', '% Voix/Ins', '% Voix/Exp' containing all the col_... values etc.

How can I do that(get the columns as rows) like specified above?

Tokoro-San
  • 37
  • 1
  • 7

1 Answers1

2

Assuming df is your input wide-format DataFrame, you can use :

out = (
    pd.DataFrame(df.to_numpy().reshape(-1, 6), columns= df.columns[:6])
)

Output :

print(out)

    N°Panneau  Sexe  Prénom_Nom_Cand1  Voix  % Voix/Ins  % Voix/Exp
0           5     8                 9     5           0           0
1           1     7                 6     9           2           4
2           5     2                 4     2           4           7
..        ...   ...               ...   ...         ...         ...
57          9     7                 0     5           2           2
58          8     5                 0     5           9           8
59          6     6                 0     4           7           3

[60 rows x 6 columns]

Input used :

cols = [
    'N°Panneau', 'Sexe', 'Prénom_Nom_Cand1', 'Voix', '% Voix/Ins', '% Voix/Exp',
    'col_28', 'col_29', 'Prénom_Nom_Cand2', 'col_32', 'col_33', 'col_34',
    'col_35', 'col_36', 'Prénom_Nom_Cand3', 'col_39', 'col_40', 'col_41',
    ...
]

np.random.seed(1)

df = pd.DataFrame(np.random.randint(0, 10, (5, len(cols))), columns=cols)


   N°Panneau  Sexe  Prénom_Nom_Cand1  ...  col_102  col_103  col_104
0          5     8                 9  ...        6        8        0
1          2     7                 7  ...        5        4        0
2          7     8                 9  ...        3        8        3
3          5     6                 7  ...        9        1        2
4          0     4                 7  ...        4        7        3

[5 rows x 72 columns]

Update :

gh_url = "https://raw.githubusercontent.com/kivircik55/dataset/main/" \
         "resultats_par_niveau_burvot_t1_france_entiere_prepared_joined" \
         "_prepared%20(1).csv"

df = pd.read_csv(gh_url)

idx = df.columns.get_loc("Prénom_Nom_Cand1")-2

wide_blocks = df.iloc[:, idx:]

long_blocks = pd.DataFrame(
    wide_blocks.to_numpy().reshape(-1, 6),
    columns= wide_blocks.columns[:6]
)

out = (
    df.iloc[:, :idx].join(
        long_blocks.set_index(
            np.arange(len(df)).repeat(len(long_blocks.columns)*2)))
)

Output :

print(out)

    Année  Code du département Libellé du département  ...  Voix % Voix/Ins  % Voix/Exp
0    2022                   24               Dordogne  ...     2       0.23        0.39
0    2022                   24               Dordogne  ...    13       1.51        2.51
0    2022                   24               Dordogne  ...   139      16.13       26.89
..    ...                  ...                    ...  ...   ...        ...         ...
43   2022                   24               Dordogne  ...    39       4.23        5.86
43   2022                   24               Dordogne  ...     4       0.43         0.6
43   2022                   24               Dordogne  ...    16       1.74         2.4

[528 rows x 29 columns]
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • What you've done is very helpful but it doesn't work because of columns that we have before what I specified. I've added ... before the columns. Do you know how can we do it ? – Tokoro-San Jun 20 '23 at 13:07
  • What does the `...` refer to ? Can you be more specific and/or provide a full [MRE](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) ? – Timeless Jun 20 '23 at 13:16
  • Here is a github link where you will find the csv file : https://github.com/kivircik55/dataset – Tokoro-San Jun 20 '23 at 13:26
  • The last block has only 4 columns (*instead of 6*) `["Prénom_Nom_Cand12", "col_102", "col_103", "col_104"]`. Is it a typo ? – Timeless Jun 20 '23 at 13:32
  • It's because you have 2 columns before Prénom_Nom_CandX et 3 after which form the 6 block – Tokoro-San Jun 20 '23 at 13:35
  • I added an update to the answer. Can you check to see if it's good ? – Timeless Jun 20 '23 at 14:07
  • Thank you you are the GOAT. You saved me for like a 2 hours struggling Have a nice day :) – Tokoro-San Jun 20 '23 at 14:30