-1

mutate1:

       Hugo_Symbol  Start_position Tumor_Seq_Allele1 Variant_Classification
5           POU3F1        38512139                 G      Missense_Mutation
356140      POU3F1        38511502                 C      Missense_Mutation
388147      POU3F1        38511377                 A      Nonsense_Mutation

I tried

>>> startpos = np.zeros(3)
>>> for ind in mutate1.index:
...     for i in range(3):
...         startpos[i] = int(mutate1['Start_position'][ind]-1)
...         print(startpos)
... 
[38512138.        0.        0.]
[38512138. 38512138.        0.]
[38512138. 38512138. 38512138.]
[38511501. 38512138. 38512138.]
[38511501. 38511501. 38512138.]
[38511501. 38511501. 38511501.]
[38511376. 38511501. 38511501.]
[38511376. 38511376. 38511501.]
[38511376. 38511376. 38511376.]

However, I want startpos = [38512138, 38511501, 38511376], how should I change the current code?

Geinkehdsk
  • 69
  • 6

2 Answers2

2

Don't iterate over DataFrames when it isn't needed. Use tolist() in a list comprehension:

startpos = [i-1 for i in mutate1["Start_position"].tolist()]
not_speshal
  • 22,093
  • 2
  • 15
  • 30
0

Assuming you are using a Pandas DataFrame, which is what it seems, it is bad practice to iterate through one. There is an inbuilt pandas function called to_list

So just use startpos = mutate1['Start_position'].to_list()

To subtract one value use list comprehension startpos = [x - 1 for x in startpos]

Then convert the python list to a numpy array by startpos = np.array(startpos)

Matt
  • 1