2

I am working on windows machine. I am trying to load pandas dataframe which has one columns as text(string) and another column as label (int). I want to load this dataframe into pytorch dataset. Let's say above data is in dataframe df.

I am using below

train_target = torch.DoubleTensor(df['Text'].values)

I also used below to load directly from csv file.

TEXT1 = data.Field(tokenize = 'spacy', include_lengths = True)
LABEL1 = data.LabelField(dtype = torch.float)
fields = [('text',TEXT1),(label,LABEL1)]
train_data, valid_data, test_data = data.TabularDataset.splits(
                                        path = '.',
                                        train = '1.csv',
                                        validation = '1.csv',
                                        test = '1.csv',
                                        format = 'csv',
                                        fields = fields,
                                        skip_header = True
)

But it is giving error as

OverflowError: Python int too large to convert to C long

I am searching for such text conversion into pytorch,tensor object .

Fábio Perez
  • 23,850
  • 22
  • 76
  • 100
  • Do you have numbers larger than `sys.maxsize` in `LABEL1`? See https://stackoverflow.com/questions/38314118/overflowerror-python-int-too-large-to-convert-to-c-long-on-windows-but-not-ma – Fábio Perez Sep 18 '19 at 11:10
  • Also, please post the top 10 lines of your csv here. – Fábio Perez Sep 18 '19 at 11:12
  • The problem is with TEXT1 . I believe while collecting text data it is not able to understand it (strings) , as tensors know only int,float,bool etc. My csv contains only below lines ```Text Label My Name is rahul 0 My name is julia 1 ``` – Lalit Somnathe Sep 18 '19 at 12:59
  • How can I directly use pandas dataframes (train, validation, test) for torch, and process them for deep learning (change each to text, label fields, and build vocab)? – mah65 Apr 04 '21 at 12:21

0 Answers0