0

I'm trying to extract values from array rows of a specific column with specified indices.

A dummy example, if I have a column called 'arr' in my dataframe where each array below is a row-

[1, 2, 3, 4, 5]

[6, 7, 8, 9, 10]

[11, 12, 13, 14, 15]

[16, 17, 18, 19, 20]

I've tried:

for row in df.itertuples(): 
    i1 = [0,1,2]
    r1 = np.array(df.arr)[i1]

    i2 = [2,3]
    r2 = np.array(df.arr)[i2]

which gives the rows 0, 1 and 2 from the dataframe.

And I've tried:

for row in df.itertuples(): 
    i1 = [0,1,2]
    r1 = np.array(row.arr)[i1]

    i2 = [2,3]
    r2 = np.array(row.arr)[i2]

which gives the values from only the last row. I don't understand why.

What I want to get are the indices specified in i1 and i2 as two different variables (r1 and r2) for each row. So-

r1 should give-

[1, 2, 3]

[6, 7, 8]

[11, 12, 13]

[16, 17, 18]

And r2 should give-

[3, 4]

[8, 9]

[13, 14]

[18, 19]

I've also used iterrows() with no luck.

Sp_95
  • 133
  • 9
  • Does this answer your question? [How to iterate over rows in a DataFrame in Pandas?](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) – bug_spray May 31 '20 at 18:00
  • No, because the answers in the link don't talk about extracting values from arrays for each row. – Sp_95 May 31 '20 at 18:04
  • @Sp_95 do you want two columns for each row, where 1st column contains the **df[a]** whereas 2nd column contains **df[b]** ? – snehil May 31 '20 at 18:07
  • I would like 2 columns containing the values of r1 and r2. So basically the extracted elements. – Sp_95 May 31 '20 at 18:09
  • Is there a reason you can't just filter your dataframe with `loc` or `iloc`? – bug_spray May 31 '20 at 18:09
  • 1
    Please post an example of your desired output – bug_spray May 31 '20 at 18:10
  • @Sp_95 is this what you want https://imgur.com/a/XHfaXsx, each row in r1 column has three element and each row in r2 column has 2 elements. – snehil May 31 '20 at 18:35
  • @Snehil yes! Each row in r1 should have the elements from the indices specified in i1 ( so 3 elements) and each row in r2 should have the elements from the indices specified in i2 (2 elements). – Sp_95 May 31 '20 at 18:42

2 Answers2

1

if you want columns r1 and r2 in same dataframe , you can use:

df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
df['r1']=df['arr']
df['r1']=df['r1'].apply(lambda x:x[0:3])
df['r2']=df['arr']
df['r2']=df['r2'].apply(lambda x:x[2:4])

I have applied lambda that does the work, is this what you want?

If you want a new dataframe with rows r1 and r2 , you can use

from operator import itemgetter 
a=[0,1,2]
b=[2,3]
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
data=pd.DataFrame()
data['r1']=df['arr']
data['r2']=df['arr']
data['r1']=data['r1'].apply(lambda x:itemgetter(*a)(x))
data['r2']=data['r2'].apply(lambda x:itemgetter(*b)(x))
data  

does this edit help you!

snehil
  • 586
  • 6
  • 12
  • If I have another array from where I pull and store the indices in different variables, how can I apply that to the lambda function? In other words, is there a way to not to explicitly specify [0:3] and [2:4] and instead call the variable where I saved the indices (which in my case are i1 and i2)? – Sp_95 May 31 '20 at 18:58
  • The output is exactly what I want, except I don't want to hard code those indices into the solution. – Sp_95 May 31 '20 at 19:04
  • Thanks for your solution bu what I'm looking for is-Is there a way to still maintain i1=[0, 1, 2] and call i1 into the lambda function instead of coding it as [0:3] for r1 and similarly r2? – Sp_95 May 31 '20 at 19:14
  • @Sp_95 see new edit in the answer. i have used a and b directly and not slicing , so this should solve your problem – snehil May 31 '20 at 19:19
1

Try:

i1, i2 = [0,1,2],[2,3]
number_rows = 4
r1, r2 = np.zeros((number_rows,3)), np.zeros((number_rows,2))
for i in range(number_rows):
    r1[i] = np.array(df.arr)[i][i1]
    r2[i] = np.array(df.arr)[i][i2]

The problem with your first attempt was, that if you give a 2D (like np.array(df.arr)) array only one index, it will return the whole row for each index.

In your second attempt, you actually get the results you want in each row, but you overwrite the results of former rows, so you only get the values of the last row. You can fix this by inserting the results of each row into your result arrays, as done above.

simorius
  • 11
  • 2
  • Ah, I see what you mean by overwriting previous results. I tried out your solution but it gives the error 'list indices must be integers or slices, not list' – Sp_95 May 31 '20 at 19:17
  • Hm that is weird, because it works for me. Maybe you forgot to convert your array to a numpy array, because you numpy array indices can be lists, whereas list indices cannot be lists. But as I see the other solution worked for you, which is actually more elegant ;) – simorius Jun 02 '20 at 07:13