I need some advice regarding updating a large number of records in a MySql database. I am currently storing a large number of words and their suffixes (a suffix array) into a database which results in a row count of approximately 4.3 million. Each record contains the primary key id
, the actual word word
, the document the word is in document
, the offset of the word within the document 'offset', a flag which determines whether the record is a whole word or not flag
and a link
to the next record with the same value for word
. Each record is initialized with a link value of -1
.
This is my current code for updating the links in the database:
public void Link(object c)
{
DBConnection conn = (DBConnection)c;
rowcount = conn.GetRowCount();
string word;
int link;
List<Record> recordsList = new List<Record>();
List<Record> recordsMatched = new List<Record>();
for (int i = 0; i < rowcount; i++)
{
recordsList.AddRange(conn.ReadQuery("SELECT * FROM csa2018.words WHERE id = " + i));
word = recordsList[0].Word;
link = recordsList[0].Link;
recordsMatched = conn.ReadQuery("SELECT * FROM csa2018.words WHERE word = '" + word + "'");
for(int j = 0; j < recordsMatched.Count-1; j++)
{
if (recordsMatched[j].Link == -1)
{
conn.WriteQuery("UPDATE csa2018.words SET link = " + recordsMatched[j + 1].Id + " WHERE id = " + recordsMatched[j].Id);
}
else
{
break;
}
linkedRecords++;
}
linkedRecords++;
recordsMatched.Clear();
recordsList.Clear();
}
Form1.linkingFinished = true;
}
Overall, it has good performance when it finds words which are repeated frequently; however at around 60% the performance deteriorates because most of the remaining words are unique. My guess is that this query:
recordsMatched = conn.ReadQuery(
"SELECT * FROM csa2018.words WHERE word = '" + word + "'");
shouldn't be like this because it is being called once for every row. Are there any better approaches like using stored procedures maybe?
P.S.: the ReadQuery
method reads rows using the query supplied and constructs a Record
object and adds each record to a List<Record>
.
This is what my database looks like :
CREATE TABLE words ( id int(11) NOT NULL, word varchar(45) NOT NULL,
document varchar(45) NOT NULL, offset int(11) NOT NULL, flag int(11) NOT NULL,
link int(11) DEFAULT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8