0

I use LOAD DATA LOCAL INFILE and imported a large TXT file, which works great. It imports the data in the mydata Mysql table using the following

LOAD DATA LOCAL INFILE 'Base.txt' INTO TABLE mydata CHARACTER 
SET
   latin1 FIELDS TERMINATED BY '\t' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES (keywords , fixedprice, quantity, description, condition1, image1url, merchantcategory, leadtime, Manufacturer, Model_Number, AZ_Code, Volts, Watts, ColorTemp, Shape, Life, Base 

)

Before importing the data is it possible to replace the stripoutwords in the Keywords and the description column and remove it. So check if select name from stripoutwords EXISTS in the Keywordsand thedescription column, If EXISTs then remove those keywords from the import

The schema for both tables is here:

CREATE TABLE `stripoutwords` (
  `id` int(11) NOT NULL,
  `name` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `mydata` (
  `keywords` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  `fixedprice` float(6,2) DEFAULT NULL,
  `quantity` varchar(255) DEFAULT NULL,
  `description` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  `condition1` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `image1url` text CHARACTER SET latin1 COLLATE latin1_general_ci,
  `merchantcategory` text CHARACTER SET latin1 COLLATE latin1_general_ci,
  `leadtime` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `Manufacturer` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `Model_Number` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `AZ_Code` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL,
  `Volts` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `Watts` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `ColorTemp` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `Shape` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `Life` varchar(400) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL,
  `Base` varchar(255) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
user580950
  • 3,558
  • 12
  • 49
  • 94
  • Please [edit] your question to present an example of a few rows of input data and the result you want. – O. Jones Sep 06 '21 at 21:36
  • And, see this. https://stackoverflow.com/questions/2139069/how-to-skip-columns-in-csv-file-when-importing-into-mysql-table-using-load-data and this. https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-input-preprocessing – O. Jones Sep 06 '21 at 21:38

1 Answers1

1

You have to pre processs the data with a function

you would write a sql query like this

LOAD DATA LOCAL INFILE 'Base.txt' INTO TABLE mydata CHARACTER SET   latin1 
FIELDS TERMINATED BY '\t' 
ENCLOSED BY '"' 
LINES TERMINATED BY '\n' 
IGNORE 1 LINES 
(@keywords , fixedprice, quantity, @description, condition1, image1url, merchantcategory, leadtime, Manufacturer, Model_Number, AZ_Code, Volts, Watts, ColorTemp, Shape, Life, Base 
)
SET keywords = Removetext(@keywords),description = Removetext(@description)

with a function like

CREATE function Removetext (
     texttofilter text
) RETURNS TEXT DETERMINISTIC
BEGIN
    DECLARE finished INTEGER DEFAULT 0;
    DECLARE stopwords TEXT DEFAULT "";

    -- declare cursor for employee email
    DEClARE curlist 
        CURSOR FOR 
            SELECT `name` FROM `stripoutwords`;

    -- declare NOT FOUND handler
    DECLARE CONTINUE HANDLER 
        FOR NOT FOUND SET finished = 1;

    OPEN curlist;

    getwordlist: LOOP
        FETCH curlist INTO stopwords;
        IF finished = 1 THEN 
            LEAVE getwordlist;
        END IF;
        -- build email list
        SET texttofilter = REPLACE (texttofilter,stopwords,"");
    END LOOP getwordlist;
    CLOSE curlist;
RETURN texttofilter;
END

Here is an example how the function works https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=28cf0a3ba53afcffcb12740e8d07b912

nbk
  • 45,398
  • 8
  • 30
  • 47