1

I want to create a set of n columns in a DataFrame each assigned a separate value using a list comprehension.

#My original dataframe
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})

   A  B
0  1  4
1  2  5
2  3  6
#Expected output - 

pd.concat([df, pd.DataFrame(np.tile(np.array([5,10,15,20,25])[:,None], 3).T)], axis=1)

   A  B  0   1   2   3   4
0  1  4  5  10  15  20  25
1  2  5  5  10  15  20  25
2  3  6  5  10  15  20  25

I need to do it in this fashion -

#ROUGH structure of the code that I am looking for -
n = "number of columns i want to add"
df[[i for i in range(n)]] = numpyarray #whose shape is (n,3)

The error that I face is quite obvious -

KeyError: "None of [Int64Index([0, 1, 2], dtype='int64')] are in the [columns]"

#AND

SyntaxError: can't assign to list comprehension

I have read other solutions which allow adding multiple columns but this one specifically needs a loop with an iterator of n because -

  1. The data frame may need 25 columns added and that doesn't depend on the array of values
  2. The array of values can be (3, 15) which means that last 10 of the columns will not take their values from the array
  3. The prefered solution would be a list comprehension since the list of columns that I would be creating (25 for example) come from a list comprehension based iterator
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • 1
    You cannot do what you ask for. Pandas does not allow creating multiple columns at once. – DYZ Sep 04 '20 at 04:37
  • 1
    That's one answer for it. but I am still not sure why my question is closed. – Akshay Sehgal Sep 04 '20 at 04:38
  • [related](https://stackoverflow.com/questions/39050539/how-to-add-multiple-columns-to-pandas-dataframe-in-one-assignment) , If using loops, may be you can consider not transposing the array , then zip and iterate +assign , something like `arr = np.tile(np.array([5,10,15,20,25])[:,None], 3)` , `for a,b in zip(range(5),arr): df[a] = b` – anky Sep 04 '20 at 04:52
  • @AkshaySehgal I guess you need array of shape `(3, n)`...Because you are assigning `n` columns – Shubham Sharma Sep 04 '20 at 04:53
  • @anky I went through that post and while most of those work for me, the issue is that I am not aware of how many columns I need to add. Also, the array that will be used to assign values may or may not represent the number of columns that the dataframe will have. So, if the array is (3,10), it may happen that only the first 10 new generated columns of the dataframe are filled, while 5 more columns remain Nan. – Akshay Sehgal Sep 04 '20 at 05:02
  • @ShubhamSharma yes you are right. – Akshay Sehgal Sep 04 '20 at 05:02
  • @AkshaySehgal I see, i think you can still use the `zip` solution I commented aloong with `reindex()` on `axis=1` to adjust the remaining columns with default NaN value, may be create a better example related to your comment will be clearer. – anky Sep 04 '20 at 05:10
  • pd.__version__ = 1.0.3 – Akshay Sehgal Sep 04 '20 at 08:37
  • @AkshaySehgal, your original code can be tweaked a bit to get the result. `np.tile(np.array([[5*i] for i in range(1,n+1)]), len(df)).T,)]` will give you the desired result. You just need to replace the hardcoded values with list comprehension and change hardcoded 3 to `len(df)`. – Joe Ferndz Sep 05 '20 at 07:37

2 Answers2

2

One idea for create columns by list comprehension, tested in pandas 1.1.1:

df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})

#list created by list comprehension
L = [i + 1 for i in range(5)]
print (L)
[1, 2, 3, 4, 5]

n = len(L)
df[list(range(n))] = L

print (df)
   A  B  0  1  2  3  4
0  1  4  1  2  3  4  5
1  2  5  1  2  3  4  5
2  3  6  1  2  3  4  5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • interesting I came up with the same answer. :) – Joe Ferndz Sep 04 '20 at 05:45
  • @JoeFerndz - Hmm, I think it is different, I assign list `L` without for loop. – jezrael Sep 04 '20 at 05:45
  • 1
    Sadly, this doesnt work for me (`pd.__version__` = 1.0.3). Throws error - `KeyError: "None of [Int64Index([0, 1, 2, 3, 4], dtype='int64')] are in the [columns]"`. It would be interesting to know what changed in pandas to allow this functionality. – Akshay Sehgal Sep 04 '20 at 08:36
  • 1
    @AkshaySehgal - 2 ideas - Hiow working pre assign values like `df[list(range(n))] = 1` and then `df[list(range(n))] = L` ? Another idea is `df.loc[:, list(range(n))] = L` – jezrael Sep 04 '20 at 08:38
  • 1
    See the iterator for me comes as an output of a function – Akshay Sehgal Sep 04 '20 at 09:13
  • @AkshaySehgal - Tested like `i = iter([i + 1 for i in range(5)]) print (i)` and for me working converting it to list like `L = list(i) print (L) n = len(L) df[list(range(n))] = L` – jezrael Sep 04 '20 at 09:18
2

Here's an updated version of the solution.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
print(df)
n = 10

df = pd.concat([df,pd.DataFrame(
    np.tile([5*(i+1) for i in range(n)],len(df)).reshape(len(df),n),
    columns=[i+1 for i in range (n)])],axis=1)

print(df)

The output from this is as follows:

Original DataFrame:

   A  B
0  1  4
1  2  5
2  3  6

Merged dataframe

   A  B  1   2   3   4   5   6   7   8   9  10
0  1  4  5  10  15  20  25  30  35  40  45  50
1  2  5  5  10  15  20  25  30  35  40  45  50
2  3  6  5  10  15  20  25  30  35  40  45  50

We need to get a table with values [5,10,15,...,n*5]. To achieve this, I am using:

np.tile([5*(i+1) for i in range(n)],len(df))

This will give me an array like this:

array([ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50,  5, 10, 15, 20, 25, 30, 35,
       40, 45, 50,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

Now we need to switch this to 3 rows by n columns where n=10 in this example. I am doing that using:

reshape(len(df),n)

Here len(df) = 3 and n = 10

The result of

np.tile([5*(i+1) for i in range(n)],len(df)).reshape(len(df),n)

will be :

array([[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
       [ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
       [ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]])

Now that I have the values listed, I just need to get the column names. I am using a list comprehension to create the column names.

columns=[i+1 for i in range (n)])]

And obviously we got to use axis=1 otherwise it will not concatenate correctly.

Putting all this together gives you the final result set.

I went back and tried to use Akshay's logic. Here' what I got. This also works.

df2 = pd.concat([df,pd.DataFrame(
    np.tile(np.array([[5*i] for i in range(1,n+1)]), len(df)).T,
    columns=[i+1 for i in range (n)])],axis=1)
print(df2)

If you think there are easier ways to do this, please let me know so I can learn as well.

The previous response is below:

I am fairly new to pandas and still learning to figure things out. Here's what I tried and it looks like this is what you want.

import pandas as pd
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
lst = [5,10,15,20,25]
n = 6
for i in range(1,n): df[i] =lst[i-1]
print(df)

This gave me the following output:

   A  B  1   2   3   4   5
0  1  4  5  10  15  20  25
1  2  5  5  10  15  20  25
2  3  6  5  10  15  20  25

Does this make sense and is this what you are looking for?

Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33