2

I think I've caught the idea of one-line for loop, but now I have a problem. I know I can define a dataframe column using this like:

df = pd.DataFrame(columns=["columnA"])

list = [0, 1, 2, 3, 4]

df["columnA"] = [i for i in list]

Now my question is: Is it possible to define 2 columns in a one-line for loop?

I've tried this:

df["columnA"], df["columnB"] = [i, i**2 for i in list]
df["columnA"], df["columnB"] = [[i, i**2] for i in list]

None of this worked. I'm using Python 3.10

darioeu
  • 25
  • 5
  • 1
    Does this answer your question? [How to add multiple columns to pandas dataframe in one assignment?](https://stackoverflow.com/questions/39050539/how-to-add-multiple-columns-to-pandas-dataframe-in-one-assignment) – Chris Jan 27 '23 at 14:32
  • If these are the only values you need, this should work (assuming two different columns so you don't overwrite the other result): `df["columnA"], df["columnB"] = ([i**n for i in list] for n in [1, 2])` – B Remmelzwaal Jan 27 '23 at 14:37
  • `df["columnA"], df["columnA"] = ...` - looks like you are trying to assign to the same column twice, is that intentional? – wwii Jan 27 '23 at 14:38
  • 3
    It's not a "one-line for loop"; it's a [list comprehension](https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries), and it always produces exactly one list. You can *process* that list, though, to produce two iterables. – chepner Jan 27 '23 at 14:40

3 Answers3

6

You have to zip your output:

df['A'], df['B'] = zip(*[(i, i**2) for i in lst])
print(df)

# Output
   A   B
0  0   0
1  1   1
2  2   4
3  3   9
4  4  16

You can also use np.array:

df[['A', 'B']] = np.array([(i, i**2) for i in lst])
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

Here is my solution to your problem:

1: Column creation

Create the column with the dataframe, it is much faster than adding the column later

list = [0, 1, 2, 3, 4]
df = pd.DataFrame({
    "columnA":list,
    "columnB":[i**2 for i in list]
})

By testing it with %%timeit we obtain:

161 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Now, lets check your version:

df = pd.DataFrame(columns=["columnA"])

list = [0, 1, 2, 3, 4]

df["columnA"] = [i for i in list]
df["columnB"] = [i**2 for i in list]

1.58 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Your version is more or less 10000x slower.

2: Using .assign

If you cannot create all columns when the dataframe is created, you can create multiple columns with a single method by using .assign:

df = pd.DataFrame({
    "columnA" :[i for i in list]
}).assign(
    columnB = [i**2 for i in list],
    columnC = [i**3 for i in list]
)

3: Single for

If you really want to use a single for, you can build the data first and the dataframe later:

data = [
    {
        "columnA":i,
        "columnB":i**2
    } for i in list
]
df = pd.DataFrame(data)

Finally, list is already a python keyword, so you should avoid avoid overwriting it. You will lose access to the actual function and type, so these wont work:

list(iter([1,2,3])) (converts an interable into a list)

isinstance([1,2,3],list) (checks that the variable is of the list type)

Nilo Araujo
  • 725
  • 6
  • 15
  • 1
    Really useful examples for boosting the code. This answer my questions of wich way is faster. Thanks! – darioeu Jan 30 '23 at 13:40
0

Right now your code is overwriting what's in Column A.

df["columnB"], df['columnA'] = [i**2 for i in list], [i for i in list]

The above answer is much better than mine. Learned something new today.

MichaelB
  • 55
  • 5