I am trying to figure out how to renumber a certain file format and struggling to get it right.
First, a little background may help: There is a certain file format used in computational chemistry to describe the structure of a molecule with the extension .xyz. The first column is the number used to identify a specific atom (carbon, hydrogen, etc.), and the subsequent columns show what other atom numbers it is connected to. Below is a small sample of this file, but the usual file is significantly larger.
259 252
260 254
261 255
262 256
264 248 265 268
265 264 266 269 270
266 265 267 282
267 266
268 264
269 265
270 265 271 276 277
271 270 272 273
272 271 274 278
273 271 275 279
274 272 275 280
275 273 274 281
276 270
277 270
278 272
279 273
280 274
282 266 283 286
283 282 284 287 288
284 283 285 289
285 284
286 282
287 283
288 283
289 284 290 293
290 289 291 294 295
291 290 292 304
As you can see, the numbers 263 and 281 are missing. Of course, there could be many more missing numbers so I need my script to be able to account for this. Below is the code I have thus far, and the lists missing_nums and missing_nums2 are given as well, however, I would normally obtain them from an earlier part of the script. The last element of the list missing_nums2 is where I want numbering to finish, so in this case: 289.
missing_nums = ['263', '281']
missing_nums2 = ['281', '289']
with open("atom_nums.xyz", "r") as f2:
lines = f2.read()
for i in range(0, len(missing_nums) - 1):
if i == 0:
with open("atom_nums_out.xyz", "w") as f2:
replacement = int(missing_nums[i])
for number in range(int(missing_nums[i]) + 1, int(missing_nums2[i])):
lines = lines.replace(str(number), str(replacement))
replacement += 1
f2.write(lines)
else:
with open("atom_nums_out.xyz", "r") as f2:
lines = f2.read()
with open("atom_nums_out.xyz", "w") as f2:
replacement = int(missing_nums[i]) - (i + 1)
print(replacement)
for number in range(int(missing_nums[i]), int(missing_nums2[i])):
lines = lines.replace(str(number), str(replacement))
replacement += 1
f2.write(lines)
The problem lies in the fact that as the file gets larger, there seems to be repeats of numbers for reasons I cannot figure out. I hope somebody can help me here.
EDIT: The desired output of the code using the above sample would be
259 252
260 254
261 255
262 256
263 248 264 267
264 263 265 268 269
265 264 266 280
266 265
267 263
268 264
269 264 270 275 276
270 269 271 272
271 270 273 277
272 270 274 278
273 271 274 279
274 272 273 279
275 269
276 269
277 271
278 272
279 273
280 265 281 284
281 280 282 285 286
282 281 283 287
283 282
284 280
285 281
286 281
287 282 288 291
288 287 289 292 293
289 288 290 302
Which is, indeed, what I get as the output for this small sample, but as the missing numbers increase it seems to not work and I get duplicate numbers. I can provide the whole file if anyone wants.
Thanks!