2

Below, you can find an excerpt from a pandas timeseries. I would like to slice the rows set apart by the white space. There is a recurring pattern, i.e. 21 times value X, four different values ABCD, 21 times value Y, four different values EFGH, 21 times value Z etc. In this case, I'm interested in obtaining BCDY, FGHZ and so on.

The problem is that this pattern is sometimes interrupted due to missing data.

Any ideas? Thanks.

2014-01-20 00:00:00    197021
2014-01-20 00:01:00    197021
2014-01-20 00:02:00    197021
2014-01-20 00:03:00    197021
2014-01-20 00:04:00    197021
2014-01-20 00:05:00    197021
2014-01-20 00:06:00    197021
2014-01-20 00:07:00    197021
2014-01-20 00:08:00    197021
2014-01-20 00:09:00    197021
2014-01-20 00:10:00    197021
2014-01-20 00:11:00    197021
2014-01-20 00:12:00    197021
2014-01-20 00:13:00    197021
2014-01-20 00:14:00    197021
2014-01-20 00:15:00    197021
2014-01-20 00:16:00    196836

2014-01-20 00:17:00    196865
2014-01-20 00:18:00    196787
2014-01-20 00:19:00    196915
2014-01-20 00:20:00    196902

2014-01-20 00:21:00    196902
2014-01-20 00:22:00    196902
2014-01-20 00:23:00    196902
2014-01-20 00:24:00    196902
2014-01-20 00:25:00    196902
2014-01-20 00:26:00    196902
2014-01-20 00:27:00    196902
2014-01-20 00:28:00    196902
2014-01-20 00:29:00    196902
user2143353
  • 87
  • 1
  • 6

1 Answers1

1

Can you not just iterate though the lines, adding everything to the current group's list of lines, and then on empty line, change the list you're appending too? This assumes that the missing data would not return an empty string over not creating a new line.

x = 1
groups = { 1 : [], 2 : [], 3 : [] }
for line in file_obj.readlines():
    if line:
      groups[x].append(line)
    else:
       x += 1
       if x > 3:
          x = 0
TankorSmash
  • 12,186
  • 6
  • 68
  • 106
  • Thanks. I already thought of an iterative approach, but I would like to know if there is a vectorized way of doing things. – user2143353 Feb 03 '14 at 15:49