Formatting data in python

Question

I want to format the data into a dataframe or file with specific format. I have extracted the data from ADS Keysight. But the data structure is not useful for further processing. The data file has R(1,1) to R(2,2), L(1,1) to L(2,2), G(1,1) to G(2,2), C(1,1) to C(2,2) data for different frequencies. The file has only 2 columns 1st column is "Freq" and 2nd column is either one of the RLGC data. I have given an example here only foe 3 frequency for each of the data. And the target format as well.

The real file contains 3000 frequencies of data.

The data file is (The two columns are tab separated):

freq    R(1,1)
0.00E+00    5.11E+00
1.00E+06    5.59E+00
2.00E+06    5.69E+00


freq    R(1,2)
0.00E+00    7.13E-13
1.00E+06    9.82E-03
2.00E+06    2.47E-02


freq    R(2,1)
0.00E+00    7.13E-13
1.00E+06    9.82E-03
2.00E+06    2.47E-02


freq    R(2,2)
0.00E+00    5.11E+00
1.00E+06    5.59E+00
2.00E+06    5.69E+00


freq    L(1,1)
0.00E+00    6.61E-07
1.00E+06    6.97E-07
2.00E+06    7.06E-07


freq    L(1,2)
0.00E+00    4.46E-07
1.00E+06    4.47E-07
2.00E+06    4.48E-07


freq    L(2,1)
0.00E+00    4.46E-07
1.00E+06    4.47E-07
2.00E+06    4.48E-07


freq    L(2,2)
0.00E+00    6.61E-07
1.00E+06    6.97E-07
2.00E+06    7.06E-07


freq    G(1,1)
0.00E+00    1.04E-17
1.00E+06    6.42E-07
2.00E+06    1.29E-06


freq    G(1,2)
0.00E+00    -5.02E-18
1.00E+06    -3.11E-07
2.00E+06    -6.23E-07


freq    G(2,1)
0.00E+00    -5.02E-18
1.00E+06    -3.11E-07
2.00E+06    -6.23E-07


freq    G(2,2)
0.00E+00    1.04E-17
1.00E+06    6.42E-07
2.00E+06    1.29E-06


freq    C(1,1)
0.00E+00    5.58E-11
1.00E+06    5.54E-11
2.00E+06    5.53E-11


freq    C(1,2)
0.00E+00    -3.27E-11
1.00E+06    -3.25E-11
2.00E+06    -3.25E-11


freq    C(2,1)
0.00E+00    -3.27E-11
1.00E+06    -3.25E-11
2.00E+06    -3.25E-11


freq    C(2,2)
0.00E+00    5.58E-11
1.00E+06    5.54E-11
2.00E+06    5.53E-11

The target file should look like below (The columns are also tab separated):

freq    R(1,1)  R(1,2)  R(2,1)  R(2,2)  L(1,1)  L(1,2)  L(2,1)  L(2,2)  G(1,1)  G(1,2)  G(2,1)  G(2,2)  C(1,1)  C(1,2)  C(2,1)  C(2,2)
0.00E+00    5.11E+00    7.13E-13    7.13E-13    5.11E+00    6.61E-07    4.46E-07    4.46E-07    6.61E-07    1.04E-17    -5.02E-18   -5.02E-18   1.04E-17    5.58E-11    -3.27E-11   -3.27E-11   5.58E-11
1.00E+06    5.59E+00    9.82E-03    9.82E-03    5.59E+00    6.97E-07    4.47E-07    4.47E-07    6.97E-07    6.42E-07    -3.11E-07   -3.11E-07   6.42E-07    5.54E-11    -3.25E-11   -3.25E-11   5.54E-11
2.00E+06    5.69E+00    2.47E-02    2.47E-02    5.69E+00    7.06E-07    4.48E-07    4.48E-07    7.06E-07    1.29E-06    -6.23E-07   -6.23E-07   1.29E-06    5.53E-11    -3.25E-11   -3.25E-11   5.53E-11

The target format can be a dataframe in python which I can manipulate further. But Can anyone help me how to do it with pandas?

I am using Anaconda (Spyder) for python package (which includes python 3.6.4).

jezrael · Accepted Answer · 2018-06-30T21:00:22.857

0

Use:

#create DataFrame from csv with columns f and v 
df = pd.read_csv(filename, sep="\s+", names=['freq','v'])

#boolean mask for identify columns of new df   
m = df['v'].str.endswith(')')
#new column by replace NaNs by forward filling
df['g'] = df['v'].where(m).ffill()
#get original ordering for new columns
cols = df['g'].unique()
#remove rows with same values in v and g columns
df = df[df['v'] != df['g']]
#reshape by pivoting with change ordering of columns by reindex
df = df.pivot('freq', 'g', 'v').rename_axis(None, axis=1).reindex(columns=cols).reset_index()
print (df)
       freq    R(1,1)    R(1,2)    R(2,1)    R(2,2)    L(1,1)    L(1,2)  \
0  0.00E+00  5.11E+00  7.13E-13  7.13E-13  5.11E+00  6.61E-07  4.46E-07   
1  1.00E+06  5.59E+00  9.82E-03  9.82E-03  5.59E+00  6.97E-07  4.47E-07   
2  2.00E+06  5.69E+00  2.47E-02  2.47E-02  5.69E+00  7.06E-07  4.48E-07   

     L(2,1)    L(2,2)    G(1,1)     G(1,2)     G(2,1)    G(2,2)    C(1,1)  \
0  4.46E-07  6.61E-07  1.04E-17  -5.02E-18  -5.02E-18  1.04E-17  5.58E-11   
1  4.47E-07  6.97E-07  6.42E-07  -3.11E-07  -3.11E-07  6.42E-07  5.54E-11   
2  4.48E-07  7.06E-07  1.29E-06  -6.23E-07  -6.23E-07  1.29E-06  5.53E-11   

      C(1,2)     C(2,1)    C(2,2)  
0  -3.27E-11  -3.27E-11  5.58E-11  
1  -3.25E-11  -3.25E-11  5.54E-11  
2  -3.25E-11  -3.25E-11  5.53E-11

Last for remove index values in output csv by to_csv use index=False parameter:

df.to_csv(file, index=False, sep='\t')

edited Jun 30 '18 at 21:00

answered Jun 30 '18 at 17:15

jezrael

822,522
95
1,334
1,252

thanks a lot. The target df should be different as I posted in the question. the df that it produces have lots of multiple arrays of data. Can you check that? – aguntuk Jun 30 '18 at 18:53
Can you check solution now? – jezrael Jun 30 '18 at 19:10
the data frame that it produces has multiple arrays of variables R, L, G, C parameters but it needs to be in one data frame of multiple RLGC variables. Can you see the target dataframe that I posted in the question? Another thing is new dataframe has an extra column in the beginning with line numbers, the line numbers also need to be removed. – aguntuk Jun 30 '18 at 19:41
it is still same. there is `\` which is creating the new line but it should be the continuation like `freq R(1,1) R(1,2) R(2,1) R(2,2) L(1,1) L(1,2) L(2,1) L(2,2) G(1,1) G(1,2) G(2,1) G(2,2) C(1,1) C(1,2) C(2,1) C(2,2)` – aguntuk Jun 30 '18 at 20:24
@aguntuk - It is only display issue, no problem if write to file. For prevent it check [this](https://stackoverflow.com/q/11707586/2901002) – jezrael Jun 30 '18 at 20:28
if i want to write in a file then after `f = open('new_file_name.csv','w')`? – aguntuk Jun 30 '18 at 20:34
then use `df.to_csv(f, index=False)`, but better is `df.to_csv('new_file_name.csv', index=False)` – jezrael Jun 30 '18 at 20:35
sorry I missed that. Now it works in a file. 3 problems in csv. 1) the output is in comma separated option, can it be in "TAB" separated while writing in file?? 2) The output header has quote sign in strings`"C(1,1)"` how to remove that. 3) The order in the file it saves in the column is alphabetical. if I want to save the way I put in the target(if you see the question). The output in csv in the text editor is: `freq,"C(1,1)","C(1,2)","C(2,1)","C(2,2)","G(1,1)","G(1,2)","G(2,1)","G(2,2)","L(1,1)","L(1,2)","L(2,1)","L(2,2)","R(1,1)","R(1,2)","R(2,1)","R(2,2)"` – aguntuk Jun 30 '18 at 20:48
@aguntuk - Check edited answer - need `df.to_csv('new_file_name.csv', index=False, sep='\t')` and for change ordering of columns `reindex` like `df = df.pivot('freq', 'g', 'v').rename_axis(None, axis=1).reindex(columns=cols).reset_index()` – jezrael Jun 30 '18 at 21:01

Formatting data in python

1 Answers1