Having a .txt file
structure as below
#n 1
a 1:0.0002 3:0.0003...
#n 2
b 2:0.0002 3:0.0003...
#n 3
a 1:0.0002 2:0.0003...
...
trying to parse into dataframe of the following structure
# type 1 2 3
1 a 0.0002 null 0.0003 ....
2 b null 0.0002 0.0003 ....
3 a 0.0002 0.0003 null ....
...
describing the rule:
# i - 'i' is the row number
n:data - 'n' is the column number to fill, 'data' is the value to fill into i'th row
if the number of columns would be small enough it could be done manually, but txt considered has roughly 2000-3000 column values and some of them are missing.
import pandas as pd
data = pd.read_csv("filename.txt", sep = "#", header = None)
data1 = data.iloc[1::2]
data2 = data.iloc[::2]
I tried to remove the odd rows in data1 even in data2, then will hopefully figure out how to split the odd and merge the 2 df's, but there might be a faster and more beautiful method to do it, that's why asking here
update, spent 3 hours figuring out how to work with dataframes, as I was not that familiar with them. now from that
using
import pandas as pd
df = pd.read_csv("myfile.txt", sep = "#", header = None)
for index, col in df.iterrows():
if index%2 == 0:
col[1] = int(col[1].split('\t')[1])
for index, col in df.iterrows():
if index%2 == 1:
# print(col[0])
col[0] = col[0].split(' ')
df[0] = df[0].shift(-1)
df = df.iloc[::2]
df = df[[1,0]]
df = df.rename(columns={0: 1, 1: 0})
df.index = range(len(df))
any suggestions on how to add unknown number of phantom columnsnd fill them using "n:value" from the list to fill the "n" column with the "value"?