Here is my solution to your problem:
1: Column creation
Create the column with the dataframe, it is much faster than adding the column later
list = [0, 1, 2, 3, 4]
df = pd.DataFrame({
"columnA":list,
"columnB":[i**2 for i in list]
})
By testing it with %%timeit
we obtain:
161 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Now, lets check your version:
df = pd.DataFrame(columns=["columnA"])
list = [0, 1, 2, 3, 4]
df["columnA"] = [i for i in list]
df["columnB"] = [i**2 for i in list]
1.58 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Your version is more or less 10000x slower.
2: Using .assign
If you cannot create all columns when the dataframe is created, you can create multiple columns with a single method by using .assign:
df = pd.DataFrame({
"columnA" :[i for i in list]
}).assign(
columnB = [i**2 for i in list],
columnC = [i**3 for i in list]
)
3: Single for
If you really want to use a single for, you can build the data first and the dataframe later:
data = [
{
"columnA":i,
"columnB":i**2
} for i in list
]
df = pd.DataFrame(data)
Finally, list
is already a python keyword, so you should avoid avoid overwriting it. You will lose access to the actual function and type, so these wont work:
list(iter([1,2,3]))
(converts an interable into a list)
isinstance([1,2,3],list)
(checks that the variable is of the list type)