So I have a file which I can ope through Python read function, which returns one large string that essentially looks like a data frame, but is still a large string. So for example it could look something like this:
1609441 test.test1.test3 1/15.34 -1 100 622 669
160441 test.test1.test3 2/11.101 -1 100 140216 177363
16041 test2.test8.test6 2/15.34 -1 100 2791 2346
160441 test.test7.test5 2/15.34 1 100 Bin Any 5 1794 2346
1609441 test4.test4.test4 2/15.34 1 100 E Any 5 997 0
1642 test4.test3.test1 28.0.101 -1 100 5409155 10357332
If it were a real data frame, it would look like:
1609441 test.test1.test3 1/15.34 -1 100 622 669
160441 test.test1.test3 2/11.101 -1 100 140216 177363
16041 test2.test8.test6 2/15.34 -1 100 2791 2346
160441 test.test7.test5 2/15.34 1 100 Bin A 5 1794 2346
1609441 test4.test4.test4 2/15.34 1 100 E A 5 997 0
1642 test4.test3.test1 28.0.101 -1 1 155 7332
So as can be seen the data varies a lot. Some has 10 rows of different data, some only have 7 - and so on. Again, this is a large text string, and I have tried read_csv
and read_fwf
, but I haven't really succeeded.
Optimally it would just create a data frame with a fixed amount of columns (I know the maximum number of columns), and if doesn't have any value, well, just make a NaN
value instead.
Can this be achieved in any way ?