0

First off, I apologize if the title isn't accurate but I'm struggling to find the words to accurately describe my problem.

I'm working on a project where I'm trying to record Twitter information that I have compiled from a scraper and is currently in CSV format.

I convert that information into JSON using csvtojson, but I'm running into a problem.

Each word in the tweet I'm trying to separate by itself and then apply the rest of the information from that specific row in the CSV to that specific word.

Here's an example of a "successful" outcome:

enter image description here

I'm just not sure how to split up the array I've created where I can apply each word as a new row to the array. I'm assuming I need to separate out how I'm inserting the data, but I'm not sure how to do that?

const fileName = "items.csv";

csvtojson().fromFile(fileName).then(source => {

// Console log initial CSV data
// console.log(source);

for (var i = 0; i < source.length; i++) {
    var tweets__contentText = source[i]["tweets__contentText"],
        tweets__conversationId = source[i]["tweets__conversationId"],
        tweets__replies = source[i][" tweets__replies"],
        tweets__retweets = source[i]["tweets__retweets"],
        tweets__favorites = source[i]["tweets__favorites"],
        tweets__dateTime = source[i]["tweets__dateTime"],
        tweets__tweetId = source[i]["tweets__tweetId"]

    var tweets__modified = tweets__contentText.match(/\b(\w+)\b/g);

    console.log(tweets__modified);

    var insertStatement = `INSERT INTO ctdata values(?, ?, ?, ?, ?, ?, ?, ?)`;
    var items = [tweets__modified, tweets__contentText, tweets__conversationId, tweets__replies, tweets__retweets, tweets__favorites, tweets__dateTime, tweets__tweetId];

    // Insert data of current row into database
    db.query(insertStatement, items, (err, results, fields) => {
        if (err) {
            console.log("Unable to insert item at row ", i + 1);
            return console.log(err);
        }
    });
}
console.log("All items stored into database successfully!");
});
Moose
  • 63
  • 1
  • 7

1 Answers1

1

To insert multiple records, the SQL format is something like:

INSERT INTO table_name (column_name_1, column_name_2, column_name_3)
VALUES
    (?, ?, ?),
    (?, ?, ?),
    (?, ?, ?)
    ...
;

So we need to construct the query in above format, according to which I have re-written a part of your code with the changes

var tweets__modified = tweets__contentText.match(/\b(\w+)\b/g);
console.log(tweets__modified);

// creating a string for numbers of records we want to create
var sql_insert_statement = tweets__modified.map((record) => '(?, ?, ?, ?, ?, ?, ?, ?)').join(', ');

// appending the above string to create a final SQL query
var insertStatement = `INSERT INTO ctdata ("tweets__singleWord", "tweets__contentText", "tweets__conversationId", "tweets__replies", "tweets__retweets", "tweets__favorites", "tweets__dateTime", "tweets__tweetId") values ${sql_insert_statement}`;

// creating a SQL query data in which 
// we have all the columns data for a record 
// times the number of recrods
var insertStatementItems = tweets__modified.reduce((acc, record) => { 
    acc = [...acc, record, tweets__contentText, tweets__conversationId, tweets__replies, tweets__retweets, tweets__favorites, tweets__dateTime, tweets__tweetId];
    return acc; // was missing
}, []);

console.log({ insertStatement, insertStatementItems });

// Insert data of current row into database
db.query(insertStatement, insertStatementItems, (err, results, fields) => {
    if (err) {
        console.log("Unable to insert item at row ", i + 1);
        return console.log(err);
    }
});

Note: But mind you, there is another issue with your code, If you are expecting for this line console.log("All items stored into database successfully!"); to be printed after all the insert operation, that is wrong. Since it's async code, this line will be printed first before any operation is executed. I have added two pieces of code to fix that.

If your environment support async/await, then this code might help you

const fileName = "items.csv";

csvtojson().fromFile(fileName)
.then(async (source) => {
    
    // Console log initial CSV data
    // console.log(source);
    
    db_insert_promises = []
    
    for (var i = 0; i < source.length; i++) {
        var tweets__contentText = source[i]["tweets__contentText"],
        tweets__conversationId = source[i]["tweets__conversationId"],
        tweets__replies = source[i][" tweets__replies"],
        tweets__retweets = source[i]["tweets__retweets"],
        tweets__favorites = source[i]["tweets__favorites"],
        tweets__dateTime = source[i]["tweets__dateTime"],
        tweets__tweetId = source[i]["tweets__tweetId"]
        
        var tweets__modified = tweets__contentText.match(/\b(\w+)\b/g);
        console.log(tweets__modified);
        
        var sql_insert_statement = tweets__modified.map((record) => '(?, ?, ?, ?, ?, ?, ?, ?)').join(', ');
        var insertStatement = `INSERT INTO ctdata values ${sql_insert_statement}`;
        var insertStatementItems = tweets__modified.reduce((acc, record) => { 
            acc = [...acc, record, tweets__contentText, tweets__conversationId, tweets__replies, tweets__retweets, tweets__favorites, tweets__dateTime, tweets__tweetId];
            return acc; // was missing
        }, []);
        
        // Insert data of current row into database
        db_insert_promise_for_tweet = await new Promise(function(resolve, reject) {
            db.query(insertStatement, insertStatementItems, (err, results, fields) => {
                if (err) return reject(err);
                return resolve(results, fields);
            });
        });
    }
    
    console.log("All items stored into database successfully!");
})
.catch(console.error);

If your environment doesn't support async/await, then this code might help you

const fileName = "items.csv";

csvtojson().fromFile(fileName)
.then(source => {
    
    // Console log initial CSV data
    // console.log(source);

    db_insert_promises = []
    
    for (var i = 0; i < source.length; i++) {
        var tweets__contentText = source[i]["tweets__contentText"],
        tweets__conversationId = source[i]["tweets__conversationId"],
        tweets__replies = source[i][" tweets__replies"],
        tweets__retweets = source[i]["tweets__retweets"],
        tweets__favorites = source[i]["tweets__favorites"],
        tweets__dateTime = source[i]["tweets__dateTime"],
        tweets__tweetId = source[i]["tweets__tweetId"]
        
        var tweets__modified = tweets__contentText.match(/\b(\w+)\b/g);
        console.log(tweets__modified);
        
        var sql_insert_statement = tweets__modified.map((record) => '(?, ?, ?, ?, ?, ?, ?, ?)').join(', ');
        var insertStatement = `INSERT INTO ctdata values ${sql_insert_statement}`;
        var insertStatementItems = tweets__modified.reduce((acc, record) => { 
            acc = [...acc, record, tweets__contentText, tweets__conversationId, tweets__replies, tweets__retweets, tweets__favorites, tweets__dateTime, tweets__tweetId];
            return acc; // was missing
        }, []);
        
        // Insert data of current row into database
        db_insert_promise_for_tweet = new Promise(function(resolve, reject) {
            db.query(insertStatement, insertStatementItems, (err, results, fields) => {
                if (err) return reject(err);
                return resolve(results, fields);
            });
        });
        db_insert_promises.push(db_insert_promise_for_tweet);
    }

    return Promise.all(db_insert_promises);
})
.then((result_of_all_insert_query) => {
    console.log({ result_of_all_insert_query });
    console.log("All items stored into database successfully!");
})
.catch(console.error);
Raghav Garg
  • 3,601
  • 2
  • 23
  • 32
  • 1
    But mind you, there is another issue with your code, If you are expecting for this line `console.log("All items stored into database successfully!");` to be printed after all the insert operation, that is wrong. Since it's async code, this line will be printed first before any operation is executed. I will update my post to accommodate those changes. – Raghav Garg Oct 24 '20 at 05:39
  • Two things. 1. I don't entirely understand why the above code would work, do you have something you could reference that would explain it? I'd greatly appreciate it, I think I understand what's happening but I'm not sure why it's happening. 2. I seem to be getting a 1064 error now using the async code you wrote out: sqlMessage: "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?), (?,' at line 1" ----- Any suggestions? – Moose Oct 24 '20 at 19:34
  • 1
    for point 1, the query was you want to create multiple records, like a new record for every word in a tweet. So in my answer, I explained in order to create multiple records you need to formulate your query in above-given manner, like `INSERT INTO table (column) VALUES ('record_1'), ('recrod_2');`, so basically, we looped over every word in a tweet and created a desired SQL query format. I hope this link may help you https://stackoverflow.com/questions/452859. If you have any specific questions, please mention what is it (or which line) that is not clear for you. – Raghav Garg Oct 24 '20 at 20:03
  • for point 2, the SQL query was missing the column name, I have fixed it now, can you please retry and share if it worked? – Raghav Garg Oct 24 '20 at 20:04
  • Thanks for the link and explanation, that definitely will help. As for the code it still seems to be throwing the error. Could the problem with with the columns in the mysql database itself not being made correctly? – Moose Oct 24 '20 at 20:45
  • Can you share the error? is it still a syntax error? Also can you please share the value of `insertStatement`? It could be a possibility that db is not configured right, for that we can just run a simple SQL query like `INSERT INTO ctdata ("tweets__singleWord", "tweets__contentText", "tweets__conversationId", "tweets__replies", "tweets__retweets", "tweets__favorites", "tweets__dateTime", "tweets__tweetId") values ('tweet_word', 'tweet_word another_word', 561561, 2, 3, 4, 'date', 561561) ` in DB terminal. If this is not working you need to fix your DB, else we will need to debug the code. – Raghav Garg Oct 25 '20 at 04:32
  • I console.log'd insertStatement and received 33 (?,?,?,?,?,?,?,?) statements, the correct number of words that console.log(tweets__modified) is spitting out. The error I'm getting is still syntax related ---- code: 'ER_PARSE_ERROR', errno: 1064, sqlState: '42000', syntax; followed by the '?,?,?,? etc. On a side note I also just tried console.logging insertStatementItems and am receiving "undefined", maybe there's a mismatch somewhere? – Moose Oct 27 '20 at 01:34
  • Yeah, there was a missing `return acc;` statement from `reduce` because of which `insertStatementItems` is coming as undefined. I have fixed it, please try now. Also if you are getting a syntax error please share the whole string, only then I will be able to debug it. One more thing please try on a smaller example, like a tweet with 3-4 words, it would make debugging much faster.! – Raghav Garg Oct 27 '20 at 02:25
  • I went ahead and changed up the items.csv document to only 4 words and also copied the entire terminal output from the npm start command: https://pastebin.pl/view/5ce0e6a0 – Moose Oct 27 '20 at 02:29
  • Thanks for the output, there was an issue with the merging 2 arrays, replaced `acc += [` with `acc = [...acc, `, fixed it now. Can you please try again? – Raghav Garg Oct 27 '20 at 02:50
  • It worked a lot better this time around. The array is essentially built and the data is ready to be added, but for some reason it's getting a new syntax error? I triple checked every possible place tweets__replies is used and it's formatted correctly everywhere? https://pastebin.pl/view/cba6c675 – Moose Oct 27 '20 at 03:03
  • Please share the code of how are you initializing your instance `db`? – Raghav Garg Oct 27 '20 at 09:05