2

From a loop I have a variable A:

aa = pd.Series(np.random.randn(5))
aaaa = []
aaaa.append(aa.loc[[1]])
aaaa.append(aa.loc[[4]])
aaaa

[1    0.07856
 dtype: float64, 4    0.94552
 dtype: float64]

Now I would like to sum up (or do any other calculation) the elements within A. I tried with the sum-function, but unfortunately it does not work. For instance,

B = sum(aaaa)

gives me

1   NaN
4   NaN
dtype: float64

I have found below question and solutions, however, it does not work for my problem as the TO has only one list, and not several lists appended to each other (with different indexing)

Summing elements in a list

edit4: As I have to run this multiple times, I timed both answers:

%timeit sum([i.values for i in aaaa])
3.78 µs ± 5.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit pd.concat(aaaa).sum()
560 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

surprisingly the "loop" inside the sum is much faster than the pd.Series.concat().sum() function

edit5: to add in case someone else has the same problem: in case if it's not know whether the input is a pd.Series or a list of pd.Series, one could do following:

res = sum(aa) if isinstance(aa, pd.Series) else sum([i.values for i in aa])
eternity1
  • 651
  • 2
  • 15
  • 31
  • hi jpp, apologies for that. there must be something incorrect, otherwise it would work right? I deleted the 'sum', and included how it can look like. maybe that's clearer in terms of desired result. In fact, I would have loved to show a sample, how I could produce 'A' but that's a product of part of my code. – eternity1 Apr 08 '18 at 00:26
  • Could you advice in this particular example? I stated what I have as an input and what I would like to do with or what result I'd like to have and a current proposed method which does not work (see adjustment above) – eternity1 Apr 08 '18 at 00:35
  • "... which seems to be a numpy array" look at the source code and figure out what you're dealing with before asking a question. If you don't know the data type of your "mystery" variable, and don't include the code that generates it, then how is anyone else supposed to know? The entire point of a [MCVE] is that it presents an isolated, easily testable problem. If you don't have that, then the question *probably* isn't ready to be asked. – AnOccasionalCashew Apr 08 '18 at 00:47
  • 1
    My best guess at first glance is that you have a list of mixed data types, of which only some are supported by `sum(arg)`. I'm not entirely clear what your input or expected output is though. – AnOccasionalCashew Apr 08 '18 at 00:49
  • actually, I hoped you guys would know. but never mind, I figured out how to reproduce the input variable. Just to stress, appending is done in this example randomly, in the original code, it's a condition to be. I think now the problem can be tested. – eternity1 Apr 08 '18 at 01:24
  • 1
    Just for future reference, if you run into intermittent problems when using a randomized code path, part of putting together an MCVE is selecting a specific (fixed, ie not random) example that triggers the problem, and writing a stand alone code snippet with only that. Then people can help you with the actual issue at hand. – AnOccasionalCashew Apr 08 '18 at 01:33
  • 1
    As to this question, I'm not familiar with Pandas so you'll have to wait for someone who is to see this. It looks like your issue has to do with the data types used by Pandas `Series` not being compatible with the built-in `sum`. – AnOccasionalCashew Apr 08 '18 at 01:35

2 Answers2

2

You are misusing pd.Series.loc, which is causing your list elements to be pd.Series instead of scalars.

Try using pd.Series.iloc for integer indexing:

s = pd.Series(np.random.randn(5))

A = []
A.append(s.iloc[1])
A.append(s.iloc[4])

res = sum(A)

Note you could perform this calculation directly via pd.Series.sum:

res = s.iloc[[1, 4]].sum()

If you have a list of pd.Series, you can use:

res = pd.concat(A).sum()
jpp
  • 159,742
  • 34
  • 281
  • 339
  • thanks, jpp. Would you have an idea, how I could make it work using the pd.Series instead of making the input a scalar, i.e. some way of transformation of A? – eternity1 Apr 08 '18 at 01:56
  • Yes, the solution `res = s.iloc[[1, 4]].sum()` uses `pd.Series` directly. You don't need to make a list and append items individually. Or, build your list `[1, 4]` and use it as an input. – jpp Apr 08 '18 at 01:57
  • thanks for your prompt reply. I do get A as a list of pd.Series from another part of my program; so, I cannot directly access 's'; so the only number I have at the end (without having to modify the other part of the code) is my 'aaaa' the list of pd.Series – eternity1 Apr 08 '18 at 02:00
  • thank you, yes, that delivers also the result like ahed87' suggestion. – eternity1 Apr 08 '18 at 02:05
2

There are many ways to get out of your prediciment and only you will know the one that is best for you.

When you do aa.loc[[1]] you end up with a pd.Series, if you do aa.loc[1] you will get a scalar, as well with .iloc.

So just by dropping the 2nd pair of square brackets in aa.loc[[1]] will make your code work.

sum needs an iterable with numbers to work. So if you want to keep the 2nd pair of square brackets the following line will work as well, although you will now get a numpy array instead of a float as answer.

sum([i.values for i in aaaa])

ahed87
  • 1,240
  • 10
  • 10
  • thanks, ahed87; the straighforward way would be as jpp suggested not to use the 'loc'; however, right now the list of pd.Series is what I get (without the need to modify the other part of the code). – eternity1 Apr 08 '18 at 02:03