I have a dataset containing 485k strings (1.1 GB).
Each string contains about 700 of chars featuring about 250 variables (1-16 chars per variable), but it doesn't have any splitmarks. Lengths of each variable are known. What is the best way to modify and mark the data by symbol ,
?
For example: I have strings like:
0123456789012...
1234567890123...
and array of lengths:
5,3,1,4,...
then I should get like this:
01234,567,8,9012,...
12345,678,9,0123,...
Could anyone help me with this? Python or R-tools are mostly preferred to me...