Given a dataframe such as the following:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2'],
'C': ['C0', 'C1', 'C2']},
index=[0, 1, 2])
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
I want to add a column 'D' initialized with value False. Column 'D' will be used in future processing of the dataframe:
A B C D
0 A0 B0 C0 False
1 A1 B1 C1 False
2 A2 B2 C2 False
I generated a list of False values based on the df1 index and used it to create a df2, which was then concatenated with df1:
Dlist = [False for item in list(range(len(df1.index)))]
d = {'D':Dlist}
df2 = pd.DataFrame(d, index = df1.index)
result = pd.concat([df1, df2], axis=1, join_axes=[df1.index])
A couple of questions: Does the list comprehension in the first line need to be so involved? I tried the following, thinking that 'df1.index' is list like. It didn't work.
Dlist = [False for item in df1.index]
More broadly, is there a better approach for doing this with dataframe operations? If I were dealing with a 'csv' file containing data for df1, I could easily add 'D' to the file before generating the dataframe.
In terms of philosophy, is modifying dataframes in place, or the 'csv' files they came from, unavoidable when processing data? It certainly doesn't seem like a good when dealing with data in very large files.