I'm reading lot of log files, from which I generate dictionary by parsing each log, I want to add this dictionary to dataframe, later I use this dataframe for analysis. But the information I need in dataframe may differ every time based on user input. So I don't want all the information in the dictionary to add in to data frame. I want the columns I defined in the data frame only to add to data frame.
As of now I'm adding all the dictionaries one by one to a list, then loading this dictionary to dataframe.
for log in log_lines:
# here logic to parse the log and generate the dictionary
my_dict_list.append(d)
pd.Dataframe(my_dict_list)
In this way it adds all the keys and their values to the dataframe,
but what I want is, I will define some columns, let's say user asks ['a','b','c']
columns for analysis, I want the dataframe to load only these keys and their values to the data frame, rest should be ignored.
my_dict_list =[ {'a':'abc','b':'123','c':'hello', 'date':'20-5-2019'},
{'a':'dfc','b':'453','c':'user', 'date':'23-5-2019'},
{'a':'bla','b':'2313','c':'anything', 'date':'25-5-2019'} ]
Note: I don't want this ignoring keys at the time extraction of logs, because I will be extracting lot of logs so its time consuming.
is there a way I can achieve this, using pandas in faster way?.