0

I would like to make a .txt output from my data with the help of pd.to_csv(). I have created a dictionary which contains the columns of my data, than I converted it to a pandas df. But at this step, and the follow up .to_csv() I noticed that my middle column of data is converted from int to float. Here is the df I get:

                                  Type      Raw data  Clean data
0                     Number of Reads:  8.044857e+07    80054190
1                           Data Size:  8.044857e+09  8005419000
2                            N of fq1:  2.097854e+06        4977
3                            N of fq2:  5.575130e+05      211801
4                        GC(%) of fq1:  5.042000e+01       50.44
5                        GC(%) of fq2:  5.088000e+01       50.88
6                       Q20(%) of fq1:  9.662000e+01       96.67
7                       Q20(%) of fq2:  9.429000e+01        94.3
8                       Q30(%) of fq1:  8.769000e+01       87.74
9                       Q30(%) of fq2:  8.429000e+01        84.3
10         Discard Reads related to N:  1.696530e+05            
11  Discard Reads related to low qual:  1.971880e+05            
12   Discard Reads related to Adapter:  2.753500e+04   

For this I used this method to achieve:

data_raw_fq = [raw_reads, data_raw, n_raw_fq1, n_raw_fq2, gc_raw_fq1, gc_raw_fq2,
                    q20_raw_fq1, q20_raw_fq2, q30_raw_fq1, q30_raw_fq2, discard_n,
                    discard_low, discard_adapter]

data_clean_fq = [clean_reads, data_clean, n_clean_fq1, n_clean_fq2, gc_clear_fq1,
                      gc_clear_fq2, q20_clear_fq1, q20_clear_fq2, q30_clear_fq1, q30_clear_fq2,
                      "", "", ""]

row_names = ["Number of Reads:", "Data Size:", "N of fq1:", "N of fq2:", "GC(%) of fq1:", "GC(%) of fq2:",
                  "Q20(%) of fq1:", "Q20(%) of fq2:", "Q30(%) of fq1:", "Q30(%) of fq2:",
                  "Discard Reads related to N:", "Discard Reads related to low qual:",
                  "Discard Reads related to Adapter:"]

df_data = {
    'Type': row_names,
    'Raw data': data_raw_fq,
    'Clean data': data_clean_fq
}

QC_data = pd.DataFrame.from_dict(df_data)

In the dictionary, here are all the data, which I want to see in the df:

data_raw_fq
[80448566, 8044856600, 2097854, 557513, 50.42, 50.88, 96.62, 94.29, 87.69, 84.29, 169653, 197188, 27535]

data_clean_fq
[80054190, 8005419000, 4977, 211801, 50.44, 50.88, 96.67, 94.3, 87.74, 84.3, '', '', '']

As you can see, in the last column all the data appeared like I wanted. Some of the values are float and some are int. In the middle column the numbers and their datatype is the same.

Can you please help me, on how to achieve the same look?

Thanks!

######## UPDATE ######

So I figured out that in my 'Raw data' list there is int and float, but in the 'Clean data' there is int, float AND str also. Because of that when I created the pandas df, the following happened:

print(QC_data.dtypes)
Type           object
Raw data      float64
Clean data     object
dtype: object

I thought, why not just modify the dtype of the second column to object also. If I converted the second column to object, it decimals remained, so I added an empty line at the end of the df, so in this way all of my columns contains : int, float and str. Therefore all of my columns are defined as object.

After saving this df, I got the look I wanted!

One note here, when I want to read this file in again, I have to define the dtype! Like so:

data = pd.read_csv("file.txt", sep="\t", dtype=str)
pahi
  • 95
  • 5

1 Answers1

1

You can convert the values to a string and define a format, i.e.:

pd.Series([80448566, 8044856600, 2097854, 557513, 50.42, 50.88, 96.62, 94.29, 87.69, 84.29, 169653, 197188, 27535]).map('{:.2f}'.format)

For your code example

df_data["Clean data"] = df_data["Clean data"].map('{:.2f}'.format)

should work.

See also: https://stackoverflow.com/a/20937592/15658660

  • Converting to strings is a great idea, but when I read back the file (I have to at some point), the data will again look like the same sadly. The decimal and a lot of 0 comes to the end of every number. – pahi Jul 19 '23 at 11:13
  • Then, you might need the global settings -> pd.options.display.float_format = '{:.2f}'.format. You can also find it here https://stackoverflow.com/a/20937592/15658660 or here https://medium.com/@anala007/float-display-in-pandas-no-more-scientific-notation-80e3dd28eabe. It changes the string output for all Pandas Objects. – Fabian Pascher Jul 19 '23 at 11:59