0

Given a DataFrame df:

    d       e       f
a    0     [2]     [3]
b  [1]       0     [3]
c  [0]  [2, 3]  [3, 1]

I simply want to append values on axis=1 for the result of:

    d       e       f    appended
a    0     [2]     [3]   [0,2,3]
b  [1]       0     [3]   [1,0,3]
c  [0]  [2, 3]  [3, 1]   [0,2,3,3,1]

Surprisingly df['appended'] = df.sum(axis=1) would do it, if not for the 0 values (which aren't list) and it returns zeros for each row.

I know this is a dumb question, but I've taken up pandas just recently, and I am yet to get a feel for it.

Can you suggest anything please?

@EDIT

Yes, I tried to replace zeros with a list (although I'd rather not do that because I need those zeros to stay zeros in my original df, and creating a new df may not be the best option?):

def mk_list(x):
    if not isinstance(x, list):
        x = [x]
    return x 

df2 = df.apply(mk_list)

anyways this produced all NaN, I must be doing it wrongly.

d    [[nan, nan, nan]]
e    [[nan, nan, nan]]
f    [[nan, nan, nan]]
nutship
  • 4,624
  • 13
  • 47
  • 64
  • Why not just replace the zeros with [0] first? – BrenBarn Feb 01 '14 at 19:50
  • I tried this, see my edit. – nutship Feb 01 '14 at 19:56
  • 3
    You would need to use `applymap` instead of `apply` to do it that way. But more generally, working with lists inside DataFrames can be somewhat awkward, and working with columns where some values are lists and some are numbers is also likely to be awkward. – BrenBarn Feb 01 '14 at 20:17

1 Answers1

-1

You can apply your values to your matrice by looping through the list with values that you want to apply and append them to the matrice in the loop.

from numpy import *


# list, matrice that gets the values from the list ls
matrice = []

# list with values to apply

ls= [1,2,3,4]

for i in xrange(len(ls)):
    matrice.append([])
    for j in range(i):
        matrice[i].append(i+j)
print matrice

Secondly: your array with NAN, after normalization, row/row.sum, is because you've first assigned values to your matrice and then call the normalization on the matrice. You need to do the other way around, you need to normalize the matrice and call the matrice again

1. create a matrice and assign some values
[[ 0.  2.  0.]
 [ 0.  0.  2.]
 [ 1.  1.  0.] 

matrice = stat_function()
2. Normalize rows, row /= row.sum()
[[ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]

matrice = normalize_function()
3. Call the same matrice again
[[ 0.   1.   0. ]
 [ 0.   0.   1. ]
 [ 0.5  0.5  0. ]
matrice = stat_function()
user1749431
  • 559
  • 6
  • 21
  • Hi, thanks for answer, though are you sure you posted this is a relevant thread? This does not find my problem. – nutship Feb 02 '14 at 09:01
  • It depends, you can convert your dataframe to an array, it might be easier to append values and normalize them that way, see thread http://stackoverflow.com/questions/13187778/pandas-dataframe-to-numpy-array-include-index – user1749431 Feb 02 '14 at 12:29