-1

I am trying to implement a formula to create a new column in Dataframe using existing column but that column is a summation from 0 to a number present in some other column.

I was trying something like this:

dataset['B']=sum([1/i for i in range(dataset['A'])])

I know something like this would work dataset['B']=sum([1/i for i in range(10)])

but I want to make this 10 dynamic based on some different column.

I keep on getting this error.

TypeError: 'Series' object cannot be interpreted as an integer

Arun
  • 11
  • 4

2 Answers2

0
  1. First of all, I should admit that I could not understand you question completely. However, what I understood that you want to iterate over the rows of a DataFrame and make a new column by doing some operation/s on that value. Is that is so, then I would recommend you following link

  2. Regarding TypeError: 'Series' object cannot be interpreted as an integer: The init signature range() takes integers as input. i.e [i for i in range(10)] should give you [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. However, if one of the value from your dataset['A'] is float, or not integer , this might result in the error you are having. Moreover, if you notice, the first value is a zero, as a result, 1/i should result in a different error. As a result, you might have to rewrite the code as [1/i for i in range (1 , row_value_of_dataset['A'])]

It will be highly appreciate if you could make an example of what you DataFrame might look like and what is your desired output. Then perhaps it is easier to post a solution.

BTW forgot to post what I understood from your question:

#assume the data:
>>>import pandas as pd
>>>data = pd.DataFrame({'A': (1, 2, 3, 4)})
#the data
>>>data
  A
0  1
1  2
2  3
3  4
#doing operation on each of the rows
>>>data['B']=data.apply(lambda row: sum([1/i for i  in range(1, row.A)] ), axis=1)
# Column B is the newly added data
>>>data
   A         B
0  1  0.000000
1  2  1.000000
2  3  1.500000
3  4  1.833333

Ahsan
  • 47
  • 5
  • Thanks for the solution.This method is too slow as I am working on very big data. Can you please suggest some optimized way. – Arun Jul 02 '19 at 11:09
  • use numba in numpy and then bring it into the dataframe – M__ Jul 02 '19 at 13:36
0

Perhaps explicitly use cumsum, or even apply?

Anyway trying to move an array/list item directly into a dataframe and seem to view this as a dictionary. Try something like this, I've not tested it,

array_x = [x, 1/x for x in dataset.values.tolist()] # or `dataset.A.tolist()`
df = pd.DataFrame(data=(np.asarray(array_x)))
df.columns = [A, B]

Here the idea is to break the Series apart into a list, and input the list into a dataframe. This can be explicitly done without needing to go Series->list->dataframe and is not very efficient.

M__
  • 614
  • 2
  • 10
  • 25