Pandas - Removing duplicate column names, but appending their unique row values to existing columns

Question

this is my first time using pandas and this forum.

I have a table of scientific data with behavioral observations made, where the first column represents the type of observations and the rows are the corresponding behavioral scores. These columns are non-descript, and are varying across experiments.

My table looks something like this:

| -------- ----- | -------------- | --- ----------

| Date ............. |..................

| Arm movement | 1 ................. | 2 .............. | 2 ..... |

| Leg movement | 2 .............. | 1................. | 1......... |

| Head movement| 1 ............. | 1 ............... | 2..........
| Time ............... | 0:05 ......... | 0:10 .......... | 0:15 .... |

| Date |

| Arm movement | 3 ............. | 0 ............... | 2 .........
| Leg movement | 2 .............. | 3 ............... | 1 .........
| Head movement| 1 ...... ...... | 1 ................ | 3 ........
| Time ................. | 0:12 ........ | 0:17 .......... | 0:22 .....

My idea was to transpose this table, since it seems to me that pandas is far more powerful when working with operations relating to the columns. However, this leaves me with a table where the column names are repeated but with unique row values. Is there a way for me to keep only the unique column names, and insert the values from the repeated columns into the unique columns?

I've tried using the functions: stack, unstack, reshape and pivot, but with no success.

Please, check these two guides and update your question accordingly. Else it will be close. https://stackoverflow.com/help/minimal-reproducible-example and https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — alec_djinn, Sep 23 '21 at 10:59
you likely want to `.combine` or `.apply` a concatenating function across the columns and then insert them. If you add an MRE we can suggest an answer. — 2e0byo, Sep 23 '21 at 11:05

score -1 · Answer 1 · answered Sep 23 '21 at 11:07

The "tidy" solution or "long form" solution is to convert to this form:

Date   | Time | Movement type |  Num
---------------------------------
1/1/21 | 0:05 | Arm           |   1 
1/1/21 | 0:05 | Leg           |   2
1/1/21 | 0:05 | Head          |   2 
1/1/21 | 0:10 | Arm           |   2

etc. You can build this using a loop over each of the numeric cells in your table, and adding an appropriate row.

However I'm not a great fan of this format, and I prefer "N-dimensional array". This would be useful if you have approximately the same number of measures per day, and the same intervals between the timings on each day. I'd use numpy and have a numeric array that is

num_movements [ NUM_DATES, NUM_TIMES, NUM_MOVT_TYPES=3 ]

The benefit is you can easily sum, mean, filter etc across dimensions of this ND array. But I am probably biased...

Pandas - Removing duplicate column names, but appending their unique row values to existing columns

1 Answers1