I am building a ML classifier. For that, I have a dataset which is divided into 6 .jsonl files. Each of them is more than 1.6GB. At first I tried the following code :
import pandas as pd
data=pd.read_json("train_features_0.jsonl")
Which gave me the error "trailingError".
So I used "chunksize" and "lines" within "read_json".
import pandas as pd
data=pd.read_json("train_features_0.jsonl", chunksize=100,lines=True)
Which is giving "pandas.io.json.json.JsonReader at 0x136bce302b0"
Dataset consists of : train_features_0.jsonl, train_features_1.jsonl, train_features_2.jsonl, train_features_3.jsonl, train_features_4.jsonl, train_features_5.jsonl.
So my question is how can I use all those .jsonl files to train my classifier?
Another question is how can I use specific "name:value" pairs while training my classifier ..? I mean can I drop some name:value pairs to speed up training process.
Please pardon me, I am new to ML.