I'm not 100% sure what you're looking for. I'll give a couple examples of potential solutions. If these don't match up what you're looking for, please update your question or add a comment.
Set up (following your example info):
import pandas as pd
dict1 = {"gene": "ABC", "sample": "XYZ", "input": 23}
dict2 = {"gene": "DEF", "sample": "ERT", "input": 24}
columns = ["gene", "sample", "input"]
df = pd.DataFrame([dict1, dict2], columns=columns)
The output of df looks like:
gene sample input
0 ABC XYZ 23
1 DEF ERT 24
That looks like what you're looking for in your questions. If that's true, you can use a similar set up (like the code block at the beginning) to set up this DataFrame.
If you mean you have that format and you're looking to transpose it, I would recommend the following:
# columns will be the index from 0 to n-1:
df.transpose()
# output:
# 0 1
# gene ABC DEF
# sample XYZ ERT
# input 23 24
# try this instead
list_that_contains_n_items_to_be_columns = ["a", "b"]
df.index = pd.Index(list_that_contains_n_items_to_be_columns)
df.transpose()
# output:
# a b
# gene ABC DEF
# sample XYZ ERT
# input 23 24
If you meant you have the info you posted in a text file like:
gene : ABC
sample: XYX
input:23
gene : DEF
sample: ERT
input :24
you would need to read it in and put it in a DataFrame (similar to csv format). You could do that by:
import pandas as pd
list_of_dicts = []
with open("data.txt") as f:
number_columns = 3 # change this as necessary
line_num = 0
for line in f:
if line_num % number_columns == 0:
if line_num == 0:
dict_row = {}
else:
list_of_dicts.append(dict_row)
dict_row = {}
line_num += 1
(key, val) = line.split(":")
dict_row[str(key)] = val.rstrip()
# add your columns to that list
df = pd.DataFrame(list_of_dicts, columns=["gene", "sample", "input"])
print(df)
This will read in your file, line by line and create a list of dictionaries, which is easy to turn into a pandas DataFrame. If you want an actual csv file, you can run df.to_csv("name_of_file.csv")
.
Hope one of these helps!
EDIT:
To look over all files in a directory, you can add the following code in front of the loop:
import glob
for filename in glob.glob("/your/path/here/*.txt"):
# code you want to execute
EDIT EDIT:
The question does not seem to relate to what is being asked (see the comments of this answer). It seems the author has .tsv files that are already in DataFrame-esque format and they want the files read in as DataFrames. The sample file given is:
Sample Name: 1234
Index: IB04
Input DNA: 100
Detected ITD Variants:
Size READS VRF
Sample Name: 1235
Index: IB05
Input DNA: 100
Detected Variants:
Size READS VRF
27 112995 4.44e-01
Total 112995 4.44e-01
Example code to read this file in and create a "Sample" DF:
#!/usr/bin/python
import os
import glob
import pandas as pd
os.chdir(os.getcwd())
def get_df(num_cols=3, start_key="Sample", switch_line=""):
list_of_dfs = []
for filepath in glob.glob("*.tsv"):
list_of_dicts = []
number_columns = num_cols
line_num = 0
part_of_df = False
with open(filepath) as file:
for line in file:
# only read in lines to the df that are part of the dataframe
if start_key in line:
part_of_df = True
elif line.strip() == "":
# if an empty line, go back to not adding it
part_of_df = False
continue
if part_of_df:
# depending on the number of columns, add to the df
if line_num % number_columns == 0:
if line_num == 0:
dict_row = {}
else:
list_of_dicts.append(dict_row)
dict_row = {}
line_num += 1
(key, val) = line.split(":")
dict_row[str(key)] = val.rstrip().strip()
if len(dict_row) % number_columns == 0:
# if last added row is the last row of the file
list_of_dicts.append(dict_row)
df = pd.DataFrame(list_of_dicts, columns=['Sample Name','Index','Input DNA'])
list_of_dfs.append(df)
# concatenate all the files together
final_df = pd.concat(list_of_dfs)
return final_df
df_samples = get_df(num_cols=3, start_key="Sample", switch_line="")
print(df_samples)
This creates a DataFrame with the data for genes. If this created the dataset you are looking for, please mark this answer as accepted. Please ask a new question if you have further questions (posting a data file in the question is very helpful).