0

I have a database table with columns dbValues and dbKeys. The dbKeys value is a string containing keys separated with ;.

dbKeys 1: "name;age;color;...price"
dbKeys 2: "city;street;age;...favourite_meal"
...

dbValues has same formatting as dbKeys but contains values for this keys.

dbValues 1: "Peter;18;blue;...64"
dbValues 2: "London;Main;40;...applepie"
...

There are more than 1 000 000 000 rows. I need to delete some keys and its values from all of the records where the key is, for example the age key, so the result would be:

dbKeys 1: "name;color;...price"
dbKeys 2: "city;street;...favourite_meal"
...

dbValues 1: "Peter;blue;...64"
dbValues 2: "London;Main;...applepie"
...

Do you have any recommendation how to modify all the records with the specific key value in the dbKeys column in the most effective way? I prefer some SQL, or something in .NET / C#.

My solution is to create for-cycle over all the records (or over every 1000 records) and change every record separately. But it doesn't look very effective to me. So I'm opened for any other suggestion.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Banana Cake
  • 1,172
  • 1
  • 12
  • 31
  • 18
    This is why you should normalize your database. – HoneyBadger Nov 19 '19 at 13:43
  • 2
    Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the [How to Ask](https://stackoverflow.com/help/asking) page for help clarifying this question. – Trevor Nov 19 '19 at 13:44
  • @Çöđěxěŕ I'm looking for the most effective way, how to rewrite substrings in more than 1000 000 000 records in the db. – Banana Cake Nov 19 '19 at 13:46
  • 2
    @BananaCake I understand, but unfortunately we don't have enough information to help; this could be done numerous ways. – Trevor Nov 19 '19 at 13:48
  • 2
    probably better to do it in stages. stop creating entries with wrong keys. stop modifying entries. single user mode mayhap. create a temp table containing all index to a row that has to change. change one after the other. You do not tell us where your probelm is nor what approach you did - [edit] your post to be more specific. And - just mabye - test tthis on a non-production copy before you do it. – Patrick Artner Nov 19 '19 at 13:48
  • 1
    You say it's a Database, but it seems like a semicolon-separated values – Ashkan Mobayen Khiabani Nov 19 '19 at 14:01
  • @Çöđěxěŕ My solution is to create for-cycle over all the records (or over every 1000 records) and change every record separately. But it doesn't look very effective to me. So I'm open for any other suggestion. – Banana Cake Nov 19 '19 at 14:05
  • @AshkanMobayenKhiabani It is database, where is some table with columns dbValues and dbKeys. The semicolon-separated values are the values of this columns. – Banana Cake Nov 19 '19 at 14:06
  • 1
    The algorithm is easy enough. Get all entries of dbKeys LIKE '%[keyToRemove]%'. For each entry, split by `;`, find [keyToRemove], remember the index, remove it and Join again. Then get all entries of dbValues for that key, split, remove the index and Join again. The "fun" part will be the writing back, as that could potentially be pretty slow. How about bulk insert into a temporary table and then merge into the "real" table? -- something like: https://stackoverflow.com/questions/20635796/bulk-update-in-c-sharp – Corak Nov 19 '19 at 14:07
  • One might hope that this is a one time operation and would not be the order of the day. Given that its a one off "the most efficient" is a luxury where the academic calisthenics of figuring out what that is would be better served getting the actual work started making the changes. – Joe Johnston Nov 19 '19 at 14:09
  • @JoeJohnston Fortunately it is only one time operation. – Banana Cake Nov 19 '19 at 14:15
  • @HoneyBadger Yes, I know, I got this project right now and I can't change the db, because it is connected with many components and I need to work with it as it is... – Banana Cake Nov 19 '19 at 14:18

1 Answers1

4

This will be really difficult in SQL but pretty easy in C#. Just:

  • Load the record
  • Split the values (for both columns) into lists with String.Split()
  • Loop through the keys. If there is a match, delete the index from both lists.
  • Join the lists back into strings with String.Join()
  • Save the record

You might want to save in a batch rather than individually.

Of course storing the data in this form is not ideal in the first place. But you probably already know that.

JacquesB
  • 41,662
  • 13
  • 71
  • 86