Say I have a data file given below. The given awk command
splits the files into multiple parts using the first value of the column and writes it to a file.
chr pos idx len
2 23 4 4
2 25 7 3
2 29 8 2
2 35 1 5
3 37 2 5
3 39 3 3
3 41 6 3
3 45 5 5
4 25 3 4
4 32 6 3
4 38 5 4
awk 'BEGIN {FS=OFS="\t"} {print > "file_"$1".txt"}' write_multiprocessData.txt
The above code will split the files as file_2.txt, file_3.txt ... . Since, awk loads the file into memory first. I rather want to write a python script that would call awk
and split the file and directly load it into linux memory (and give unique variable names to the data as file_1, file_2).
Would this be possible? If not what other variations can I try.