Let us say I have the following columns in a data frame:
title
year
actor1
actor2
cast_count
actor1_fb_likes
actor2_fb_likes
movie_fb_likes
I want to select the following columns from the data frame and ignore the rest of the columns :
- the first 2 columns (title and year)
- some columns based on name - cast_count
- some columns which contain the string "actor1" - actor1 and actor1_fb_likes
I am new to pandas. For each of the above operations, I know what method to use. But I want to do all three operations together as all I want is a dataframe that contains the above columns that I need for further analysis. How do I do this?
Here is example code that I have written:
data = {
"title":['Hamlet','Avatar','Spectre'],
"year":['1979','1985','2007'],
"actor1":['Christoph Waltz','Tom Hardy','Doug Walker'],
"actor2":['Rob Walker','Christian Bale ','Tom Hardy'],
"cast_count":['15','24','37'],
"actor1_fb_likes":[545,782,100],
"actor2_fb_likes":[50,78,35],
"movie_fb_likes":[1200,750,475],
}
df_input = pd.DataFrame(data)
print(df_input)
df1 = df_input.iloc[:,0:2] # Select first 2 columns
df2 = df_input[['cast_count']] #select some columns by name - cast_count
df3 = df_input.filter(like='actor1') #select columns which contain the string "actor1" - actor1 and actor1_fb_likes
df_output = pd.concat(df1,df2, df3) #This throws an error that i can't understand the reason
print(df_output)