2

I've got a growing data set (let's call it ADC) where each row looks something like this:

AK1,2018-03-27,22:42:21,AKDT,Running Feedback calls in a loop,Iteration 1,AIN0(SE),0.091 ,AIN1(SE),-0.007 ,AIN2(SE),-0.008 ,AIN3(SE),3.659 ,Temperature,283.824,Iteration 2,AIN0(SE),0.096 ,AIN1(SE),-0.007 ,AIN2(SE),-0.003 ,AIN3(SE),3.649 ,Temperature,283.824,Iteration 3 ...

through Iteration 5. Picture in case it doesn't make sense inline

Thus you would have 5 entire columns of "AIN0", "AIN1", so on and so forth adjacent 5 columns of value of the that measurement. I'd like to make a new row for each iteration that retains the values of the first 5 columns from the original row and removes the AIN*, Temperature columns, making them the column names for the values that currently follow each one, so it looks like this. I've tried various methods using reshape, stack, wide_to_long, but it never quite works.

ADC.melt(id_vars=ADC.columns[0:4]) gives me something like this

pd.wide_to_long(ADC, stubnames="Iteration", i=[ADC.iloc[:,1:4]], j="Number")

throws this error: ValueError: stubname can't be identical to a column name

Where am I going wrong?

EDIT: Thanks for the comment @DJK, here is a copy-pastable, truncated version of the data frame:

AK1,2018-03-27,22:42:21,AKDT,Running Feedback calls in a loop,Iteration 1,AIN0(SE),0.091,AIN1(SE),-0.007,AIN2(SE),-0.008,AIN3(SE),3.659,Temperature,283.824,Iteration 2,AIN0(SE),0.096,AIN1(SE),-0.007,AIN2(SE),-0.003,AIN3(SE),3.649,Temperature,283.824
AK1,2018-03-27,22:47:05,AKDT,Running Feedback calls in a loop,Iteration 1,AIN0(SE),0.101,AIN1(SE),-0.007,AIN2(SE),-0.003,AIN3(SE),3.725,Temperature,285.634,Iteration 2,AIN0(SE),0.101,AIN1(SE),-0.007,AIN2(SE),-0.003,AIN3(SE),3.725,Temperature,285.634
AK1,2018-03-28,00:32:58,AKDT,Running Feedback calls in a loop,Iteration 1,AIN0(SE),0.116,AIN1(SE),1.399,AIN2(SE),-0.008,AIN3(SE),3.872,Temperature,290.462,Iteration 2,AIN0(SE),0.121,AIN1(SE),1.399,AIN2(SE),-0.003,AIN3(SE),3.933,Temperature,290.462
AK1,2018-03-28,09:33:19,AKDT,Running Feedback calls in a loop,Iteration 1,AIN0(SE),0.101,AIN1(SE),1.399,AIN2(SE),0.012,AIN3(SE),3.71,Temperature,258.479,Iteration 2,AIN0(SE),0.101,AIN1(SE),1.405,AIN2(SE),0.012,AIN3(SE),3.705,Temperature,258.479

I would really like it to look like (a few extra spaces added at the beginning to maybe help with readability):

Name, Date,Time,TZ,Process,Iteration,AIN0(SE),AIN1(SE),AIN2(SE),AIN3(SE),Temperature
AK1,  2018-03-27, 22:42:21, AKDT,Running Feedback calls in a loop,1,0.091,-0.007,-0.008,3.659,283.824
AK1,  2018-03-27, 22:42:21, AKDT,Running Feedback calls in a loop,2,0.096,-0.007,-0.003,3.649,283.824
AK1,  2018-03-27, 22:47:05, AKDT,Running Feedback calls in a loop,1,0.101,-0.007,-0.003,3.725,285.634,
AK1,  2018-03-27, 22:47:05, AKDT,Running Feedback calls in a loop,2,0.101,-0.007,-0.003,3.725,285.634
AK1,  2018-03-28, 00:32:58, AKDT,Running Feedback calls in a loop,1,0.116,1.399,-0.008,3.872,290.462
AK1,  2018-03-28, 00:32:58, AKDT,Running Feedback calls in a loop,2,0.121,1.399,-0.003,3.933,290.462
AK1,  2018-03-28, 09:33:19, AKDT,Running Feedback calls in a loop,1,0.101,1.399,0.012,3.71,258.479,
AK1,  2018-03-28, 09:33:19, AKDT,Running Feedback calls in a loop,2,0.101,1.405,0.012,3.705,258.479

UPDATE: I found a solution by reading the CSV in to multiple data frames (one for each iteration) using pd.read_csv(usecols=) to select only the columns containing data and manually naming the columns. I then used pd.concat to combine the dataframes and manually assigned indecies.

import pandas as pd

adc_names=["name", "date", "time", "tz", "iteration", "AIN0(SE)", "AIN1(SE)", "AIN2(SE)", "AIN3(SE)", "temperature"]

Iteration1 = pd.read_csv('ADC_CLEAN.csv', index_col=False, header=None, usecols=[0,1,2,3,5,7,9,11,13,15], names=adc_names)
Iteration2 = pd.read_csv('ADC_CLEAN.csv', index_col=False, header=None, usecols=[0,1,2,3,16,18,20,22,24,26], names=adc_names)
Iteration3 = pd.read_csv('ADC_CLEAN.csv', index_col=False, header=None, usecols=[0,1,2,3,27,29,31,33,35,37], names=adc_names)
Iteration4 = pd.read_csv('ADC_CLEAN.csv', index_col=False, header=None, usecols=[0,1,2,3,38,40,42,44,46,48], names=adc_names)
Iteration5 = pd.read_csv('ADC_CLEAN.csv', index_col=False, header=None, usecols=[0,1,2,3,49,51,53,55,57,59], names=adc_names)

ADC  = pd.concat([Iteration1, Iteration2, Iteration3, Iteration4, Iteration5], ignore_index=True)
ADC.iteration = ADC.iteration.str.strip('Iteration ')
ADC = ADC.set_index(["name","date","time","tz","iteration"])
Chemicalex
  • 21
  • 2
  • In case anyone is confused the picture of the current dataset doesn't depict its full width since it is 63 columns wide and may have impacted viewing in the browser. – Chemicalex Apr 10 '18 at 00:50
  • You will probably receive more help with this issue if you create some toy data that you can add the post, versus using an image that we cannot use to recreate your data set. [This](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) post if very helpful for understanding how to write a great question using pandas – DJK Apr 10 '18 at 01:05
  • Thank you for the comment, I've updated the post to try to bring it more inline with those guidelines. – Chemicalex Apr 10 '18 at 17:52

0 Answers0