I would like to make a .txt output from my data with the help of pd.to_csv(). I have created a dictionary which contains the columns of my data, than I converted it to a pandas df. But at this step, and the follow up .to_csv() I noticed that my middle column of data is converted from int to float. Here is the df I get:
Type Raw data Clean data
0 Number of Reads: 8.044857e+07 80054190
1 Data Size: 8.044857e+09 8005419000
2 N of fq1: 2.097854e+06 4977
3 N of fq2: 5.575130e+05 211801
4 GC(%) of fq1: 5.042000e+01 50.44
5 GC(%) of fq2: 5.088000e+01 50.88
6 Q20(%) of fq1: 9.662000e+01 96.67
7 Q20(%) of fq2: 9.429000e+01 94.3
8 Q30(%) of fq1: 8.769000e+01 87.74
9 Q30(%) of fq2: 8.429000e+01 84.3
10 Discard Reads related to N: 1.696530e+05
11 Discard Reads related to low qual: 1.971880e+05
12 Discard Reads related to Adapter: 2.753500e+04
For this I used this method to achieve:
data_raw_fq = [raw_reads, data_raw, n_raw_fq1, n_raw_fq2, gc_raw_fq1, gc_raw_fq2,
q20_raw_fq1, q20_raw_fq2, q30_raw_fq1, q30_raw_fq2, discard_n,
discard_low, discard_adapter]
data_clean_fq = [clean_reads, data_clean, n_clean_fq1, n_clean_fq2, gc_clear_fq1,
gc_clear_fq2, q20_clear_fq1, q20_clear_fq2, q30_clear_fq1, q30_clear_fq2,
"", "", ""]
row_names = ["Number of Reads:", "Data Size:", "N of fq1:", "N of fq2:", "GC(%) of fq1:", "GC(%) of fq2:",
"Q20(%) of fq1:", "Q20(%) of fq2:", "Q30(%) of fq1:", "Q30(%) of fq2:",
"Discard Reads related to N:", "Discard Reads related to low qual:",
"Discard Reads related to Adapter:"]
df_data = {
'Type': row_names,
'Raw data': data_raw_fq,
'Clean data': data_clean_fq
}
QC_data = pd.DataFrame.from_dict(df_data)
In the dictionary, here are all the data, which I want to see in the df:
data_raw_fq
[80448566, 8044856600, 2097854, 557513, 50.42, 50.88, 96.62, 94.29, 87.69, 84.29, 169653, 197188, 27535]
data_clean_fq
[80054190, 8005419000, 4977, 211801, 50.44, 50.88, 96.67, 94.3, 87.74, 84.3, '', '', '']
As you can see, in the last column all the data appeared like I wanted. Some of the values are float and some are int. In the middle column the numbers and their datatype is the same.
Can you please help me, on how to achieve the same look?
Thanks!
######## UPDATE ######
So I figured out that in my 'Raw data' list there is int and float, but in the 'Clean data' there is int, float AND str also. Because of that when I created the pandas df, the following happened:
print(QC_data.dtypes)
Type object
Raw data float64
Clean data object
dtype: object
I thought, why not just modify the dtype of the second column to object also. If I converted the second column to object, it decimals remained, so I added an empty line at the end of the df, so in this way all of my columns contains : int, float and str. Therefore all of my columns are defined as object.
After saving this df, I got the look I wanted!
One note here, when I want to read this file in again, I have to define the dtype! Like so:
data = pd.read_csv("file.txt", sep="\t", dtype=str)