Data Manipulation either in MATLAB or Python

Question

I want to manipulate data of text file either in MATLAB or python. My data file contains 3000 rows but I have posted here for just as an example of 4 rows of data. The data file has R, L, G, C data for different frequencies (here 3 frequency in 3 rows). Now I want to manipulate the data to another format (Let's say, the target file).

Now, Can anyone help me with what is the command or loop command that I need for manipulate the data for 3000 frequencies and at the end, to save it as new target text file.

The data file is:

FORMAT Freq R1:1    R1:2    R2:1    R2:2    L1:1    L1:2    L2:1    L2:2    G1:1    G1:2    G2:1    G2:2    C1:1    C1:2    C2:1    C2:2
0.00E+00    5.11E+00    7.13E-13    7.13E-13    5.11E+00    6.61E-07    4.46E-07    4.46E-07    6.61E-07    1.04E-17    -5.02E-18   -5.02E-18   1.04E-17    5.58E-11    -3.27E-11   -3.27E-11   5.58E-11
1.00E+06    5.59E+00    9.82E-03    9.82E-03    5.59E+00    6.97E-07    4.47E-07    4.47E-07    6.97E-07    6.42E-07    -3.11E-07   -3.11E-07   6.42E-07    5.54E-11    -3.25E-11   -3.25E-11   5.54E-11
2.00E+06    5.69E+00    2.47E-02    2.47E-02    5.69E+00    7.06E-07    4.48E-07    4.48E-07    7.06E-07    1.29E-06    -6.23E-07   -6.23E-07   1.29E-06    5.53E-11    -3.25E-11   -3.25E-11   5.53E-11

and target file is:

FORMAT Freq :   R1:1    R1:2    R2:1    R2:2
        L1:1    L1:2    L2:1    L2:2
        G1:1    G1:2    G2:1    G2:2
        C1:1    C1:2    C2:1    C2:2

0.00E+00    :   5.11E+00    7.13E-13    7.13E-13    5.11E+00
        6.61E-07    4.46E-07    4.46E-07    6.61E-07
        1.04E-17    -5.02E-18   -5.02E-18   1.04E-17
        5.58E-11    -3.27E-11   -3.27E-11   5.58E-11

1.00E+06    :   5.59E+00    9.82E-03    9.82E-03    5.59E+00
        6.97E-07    4.47E-07    4.47E-07    6.97E-07
        6.42E-07    -3.11E-07   -3.11E-07   6.42E-07
        5.54E-11    -3.25E-11   -3.25E-11   5.54E-11

2.00E+06    :   5.69E+00    2.47E-02    2.47E-02    5.69E+00
        7.06E-07    4.48E-07    4.48E-07    7.06E-07
        1.29E-06    -6.23E-07   -6.23E-07   1.29E-06
        5.53E-11    -3.25E-11   -3.25E-11   5.53E-11

I am using Anaconda (Spyder) for python package (which includes python 3.6.4). The code looks like what @xg.plt.py suggested which gives error because data is stored as strings. So I tried to change data type in different ways and it is giving me different errors. My code is given below:

import pandas as pd

df = pd.read_csv('Data_file.txt')
#df = df.apply(pd.to_numeric, errors='ignore')
#df = df.apply(pd.to_numeric, errors='coerce')
#df = df.apply(pd.to_numeric)
#df = df.infer_objects()
df = df.astype(float)

formatters = {col : ('{:.2e}\n'.format if col[-3:]=='2:2' else '{:.2e}'.format) for col in df.columns}
formatters[df.columns[0]] = '{:.2e} : '.format
header_formatters = {col : ('{} \n'.format if col[-3:]=='2:2' else '{} '.format) for col in df.columns}
header_formatters[df.columns[0]] = '{} : '.format

f = open('formatted.txt','w')
f.write(''.join([header_formatters[col](col) for col in df.columns])+'\n')
df.to_string(buf=f,formatters=formatters,header=False,index=False)
f.close()

If I don't change data type it gives me error like below:

ValueError: Unknown format code 'e' for object of type 'str'

If I use df = df.apply(pd.to_numeric, errors='ignore') or df = df.infer_objects() it gives me same error like above ValueError: Unknown format code 'e' for object of type 'str'

If I use df = df.apply(pd.to_numeric, errors='coerce') it runs but it coerce all the value to NaN in first column and results in formatted file like below:

FORMAT Freq R1:1    R1:2    R2:1    R2:2    L1:1    L1:2    L2:1    L2:2    G1:1    G1:2    G2:1    G2:2    C1:1    C1:2    C2:1    C2:2 : 
nan : 
nan : 
nan :

if I use df = df.apply(pd.to_numeric) it give me following error:

ValueError: ('Unable to parse string "0.00E+00\t5.11E+00\t7.13E-13\t7.13E-13\t5.11E+00\t6.61E-07\t4.46E-07\t4.46E-07\t6.61E-07\t1.04E-17\t-5.02E-18\t-5.02E-18\t1.04E-17\t5.58E-11\t-3.27E-11\t-3.27E-11\t5.58E-11" at position 0', 'occurred at index FORMAT Freq\tR1:1\tR1:2\tR2:1\tR2:2\tL1:1\tL1:2\tL2:1\tL2:2\tG1:1\tG1:2\tG2:1\tG2:2\tC1:1\tC1:2\tC2:1\tC2:2')

When I type df.dtypesto check the dataframe type it gives me following result:

FORMAT Freq\tR1:1\tR1:2\tR2:1\tR2:2\tL1:1\tL1:2\tL2:1\tL2:2\tG1:1\tG1:2\tG2:1\tG2:2\tC1:1\tC1:2\tC2:1\tC2:2    float64
dtype: object

if I use df = df.astype(float) it give me following error:

ValueError: could not convert string to float: '2.00E+06\t5.69E+00\t2.47E-02\t2.47E-02\t5.69E+00\t7.06E-07\t4.48E-07\t4.48E-07\t7.06E-07\t1.29E-06\t-6.23E-07\t-6.23E-07\t1.29E-06\t5.53E-11\t-3.25E-11\t-3.25E-11\t5.53E-11'

I suggest you start by defining a class or struct object that holds a "row" of data. Then another object holding a list of those — OneCricketeer, May 11 '18 at 00:19

score 0 · Accepted Answer · answered May 11 '18 at 00:52

One option is to use the python package pandas.

The first step would be to load the data into a dataframe, i.e. with pd.read_csv(). pandas has an extensive set of I/O tools.

Then, write the dataframe to a file. As the target format is quite unusual, the best option is probably to use the function to_string with a default formatter:

formatters = {col : ('{:.2e}\n'.format if col[-3:]=='2:2' else '{:.2e}'.format) for col in df.columns}
formatters[df.columns[0]] = '{:.2e} : '.format
header_formatters = {col : ('{} \n'.format if col[-3:]=='2:2' else '{} '.format) for col in df.columns}
header_formatters[df.columns[0]] = '{} : '.format

There is one formatter for the numeric values, and another for the header. The numeric formatter basically tells pandas to write the numbers in exponential notation, with 2 decimal positions, and if the column finishes with '2:2', add a breakline after the number. The header formatter simply has the breakline when desired. Both write a colon after the number in the first column. See PyFormat for more information.

Then, a file can be opened and the formatted string written to it:

f = open('filename.txt','w')
f.write(''.join([header_formatters[col](col) for col in df.columns])+'\n')
df.to_string(buf=f,formatters=formatters,header=False,index=False)
f.close()

The generated filename.txt looks like this:

FORMAT Freq : R1:1 R1:2 R2:1 R2:2 
L1:1 L1:2 L2:1 L2:2 
G1:1 G1:2 G2:1 G2:2 
C1:1 C1:2 C2:1 C2:2 

0.00e+00 :  5.11e+00 7.13e-13 7.13e-13 5.11e+00
6.61e-07 4.46e-07 4.46e-07 6.61e-07
1.04e-17 -5.02e-18 -5.02e-18 1.04e-17
5.58e-11 -3.27e-11 -3.27e-11 5.58e-11

1.00e+06 :  5.59e+00 9.82e-03 9.82e-03 5.59e+00
6.97e-07 4.47e-07 4.47e-07 6.97e-07
6.42e-07 -3.11e-07 -3.11e-07 6.42e-07
5.54e-11 -3.25e-11 -3.25e-11 5.54e-11

2.00e+06 :  5.69e+00 2.47e-02 2.47e-02 5.69e+00
7.06e-07 4.48e-07 4.48e-07 7.06e-07
1.29e-06 -6.23e-07 -6.23e-07 1.29e-06
5.53e-11 -3.25e-11 -3.25e-11 5.53e-11

thanks for your elaborated answer. I tried as you showed. But after formatting, writing the data is having the problem most probably. — aguntuk, May 11 '18 at 12:40
What is the problem exactly? Is it that the output file ia not the desired one? There is an error message? If so, then what message does this error print? — OriolAbril, May 11 '18 at 12:56
sorry for late reply as I was sick. I have updated my question with the code that you suggested and the error. — aguntuk, Jun 13 '18 at 13:30
@aguntuk This error is telling you that your data is stored as strings, therefore when you try to format it in exponential notation with 2 decimals, it crashes. You should first convert them to floats and then they will be formatted as desired. This question is about this conversion: https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas — OriolAbril, Jun 13 '18 at 16:00
thanks for your reply. I tried many ways to convert the data type but still, error is there. I summarized the errors in the question for more clarification. — aguntuk, Jun 14 '18 at 12:42
As you can see in the error message, the **whole** row is read as a string. If you take a look at the [read_csv() documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv), you will see that the default separator is `,`, whereas in your case, it looks like a tab is the separator, so it should be changed to `'\s+'`. — OriolAbril, Jun 14 '18 at 16:00

Data Manipulation either in MATLAB or Python

1 Answers1