1

I have a data frame including three columns named 'Altitude', 'Distance', 'Slope'. The column of 'Slope' is calculated using the two first columns 'Altitude', 'Distance'. @ the first step the purpose was to calculate 'Slope' using a condition explained below: A condition function was deployed to start from the top column of the "Distance" variable and add up (sum) values until the summation of them is greater or equal to 10 (>=10). If this condition corrects then calculate the "Slope" using the given formula: Slope=Average(Altitude)/(sum(Distance)). The summation of the 'Distance' was counting from the first value of that to the index that the 'Distance' has stopped there). The following code is for the above explanation (By Tim Roberts):

   Altitude  Distance
0      11.2     0.000
1      11.2     3.018
2      10.9     4.180
3      10.1     4.873
4       9.9     5.499
5       9.4     5.923
6       9.2     6.415
7       8.5     1.063
8       8.4     1.667
9       7.9     3.114
import pandas as pd

data = [
[11.2,     0],
[11.2,     3.018],
[10.9,     4.18],
[10.1,     4.873],
[9.9 ,     5.499],
[9.4 ,     5.923],
[9.2 ,     6.415],
[8.5 ,     1.063],
[8.4 ,     1.667],
[7.9 ,     3.114]
]

df = pd.DataFrame( data, columns=['Altitude','Distance'])
print( df )

s=[]
sumdist = 0
sumalt = 0
cntx = 0
for i in list(range(df.shape[0])):
    sumdist += df.loc[i,'Distance']
    sumalt += df.loc[i,'Altitude']
    cntx += 1
    if sumdist >= 10:
        KM_mean = sumalt / cntx / sumdist
        s.append(KM_mean)
        sumdist = sumalt = 0
        cntx = 0
if cntx:
    s.append( sumalt / cntx / sumdist )
print(s)

Output: SLOPE: [0.8988484798276862, 0.8448607949571003, 0.6933681376947548]


My QUESTION: Then The next PART: I am going to repeat the received number from the code: [0.8988484798276862, 0.8448607949571003, 0.6933681376947548]. I am looking to repeat each figure by the number of rows associated with it. for example, 0.8988484798276862 will be repeated four times in a new column then 0.8448607949571003 will be repeated two times and so on

I have written a code below but it returns me empty values:

RR=[]
for i in list(range(df.shape[0])):
    sumdist += df.loc[i,'Distance']
    sumalt += df.loc[i,'Altitude']
    cntx += 1
    if sumdist >= 10:
       R_s=np.repeat(df['Slope'].to_numpy()) 
       RR.append(R_s)

RR=DataFrame(RR)

enter image description here

Machavity
  • 30,841
  • 27
  • 92
  • 100
Allin
  • 53
  • 5
  • Could you write your question in a more precise manner? I don't really get what needs to be done. – DSteman May 19 '21 at 12:18
  • Can you please clarify what you mean by "values associated" to the slopes. Where do the numbers 4 and 2 come from? Also please include your expected output for the provided data. This can help to further clarify the format the data should be in. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 19 '21 at 12:27
  • Thanks Henry.I want to repeat the received value of the 'Slope' which is in a new column with three rows,from the first row to the end.the number of reputation will depend on a condition the summation of the 'Distance' value should be higher or equal to 10(sum(distance>=10).for example for the first item of the 'Slope' which is 0.898848will be repeated four times in the new column because the summation of the first four values of the 'Distance' is greater than 10. Then the second value of the Slope is calculated for the two next values of the 'Distance'(from index 4 to 5),see the figure please – Allin May 19 '21 at 12:42

1 Answers1

1

Use this code after you calculate s to get slope column with desired values:

sum_distance = 0
count = 0
idx = 0
slopes = []

for i in df['Distance'].values:
    idx += 1
    sum_distance += i
    if sum_distance>=10:
        slopes += [s[count]]*idx
        count += 1
        sum_distance = 0
        idx = 0

if idx > 0:
    slopes += [s[count]]*idx

df['Slope'] = slopes

Output:

>>> df
   Altitude  Distance     Slope
0      11.2     0.000  0.898848
1      11.2     3.018  0.898848
2      10.9     4.180  0.898848
3      10.1     4.873  0.898848
4       9.9     5.499  0.844861
5       9.4     5.923  0.844861
6       9.2     6.415  0.693368
7       8.5     1.063  0.693368
8       8.4     1.667  0.693368
9       7.9     3.114  0.693368

Traversed the Distance column, summed up the values and kept count of the values traversed. Whenever the sum is 10 or more, pick value from s and insert them as many times as count showed. Reset sum, count and continue.

Ank
  • 1,704
  • 9
  • 14
  • Thanks Ank, for the answer. It works for the sample data. But the real data that I am working it has a 4434 indexes for the 'Distance'. when I run the code for the big data version I get this error message: "Length of values (4430) does not match length of index (4434)" – Allin May 19 '21 at 13:25
  • Yes that could because in your real data, the sum of last few values of Distance might not be >= 10. I have edited the code to fix this. Give it another try. – Ank May 19 '21 at 13:39
  • Thanks. The first part of my question was about the 'Slope' calculations. I made some changes in the calculation of the 'Slope'. Would you please take a look and come back to me? In this comment box, I have limitation to write about. Please take a look at the beginning of the question – Allin May 19 '21 at 16:04
  • I get it there are additional steps to calculate slope now. However you should consider posting that as a new question, and link this question/answer in that post. – Ank May 19 '21 at 16:16
  • The reason is that if I code the additional steps for slope, there is a possibility I may break the code. This will be cofusing to other users who happen to visit this post later. – Ank May 19 '21 at 16:17
  • Let me know when you post it and I will help you out :) – Ank May 19 '21 at 16:29
  • Thanks Ank, I have created a new question entitled "Condition function inclusing a formula on counting and summation of values of columns in a dataframe" and linked to this one. looking forward to hearing from you. – Allin May 19 '21 at 16:59
  • please take a look at the link https://stackoverflow.com/questions/67607668/condition-function-inclusing-a-formula-on-counting-and-summation-of-values-of-co – Allin May 19 '21 at 17:34