I am new to using pandas but want to learn it better. I am currently facing a problem. I have a DataFrame looking like this:
0 1 2
0 chr2L 1 4
1 chr2L 9 12
2 chr2L 17 20
3 chr2L 23 23
4 chr2L 26 27
5 chr2L 30 40
6 chr2L 45 47
7 chr2L 52 53
8 chr2L 56 56
9 chr2L 61 62
10 chr2L 66 80
I want to get something like this:
0 1 2 3
0 chr2L 0 1 0
1 chr2L 1 2 1
2 chr2L 2 3 1
3 chr2L 3 4 1
4 chr2L 4 5 0
5 chr2L 5 6 0
6 chr2L 6 7 0
7 chr2L 7 8 0
8 chr2L 8 9 0
9 chr2L 9 10 1
10 chr2L 10 11 1
11 chr2L 11 12 1
12 chr2L 12 13 0
And so on...
So, fill in the missing intervals with zeros, and save the present intervals as ones (if there is an easy way to save "boundary" positions (the borders of the intervals in the initial data) as 0.5 at the same time it might also be helpful) while splitting all data into 1-length intervals.
In the data there are multiple string values in the column 0, and this should be done for each of them separately. They require different length of the final data (the last value that should get a 0 or a 1 is different). Would appreciate your help with dealing with this in pandas.