1

I have a MySQL table with the structure as follows:

CREATE TABLE `info` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(70) NOT NULL,
  `email` varchar(128) NOT NULL,
  `country` varchar(80) NOT NULL,
  `state` varchar(50) NOT NULL,
  `city` varchar(40) NOT NULL,
  `address` varchar(256) NOT NULL,
  `phone` varchar(15) NOT NULL,
  `regdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`),
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

The problem is, that I have the country column containing the country and the state info, like this:

UNITED STATES/Ca

But I want to separate them in their columns (explode them by / and update the first result in the country field, than insert the second result in the state field).

Can that be done only by using MySQL? (I have over 4 500 000 entries in the table)

Zoli
  • 1,081
  • 1
  • 8
  • 28

4 Answers4

3

To update all rows, but avoiding rows where either column is empty, you can use:

UPDATE info
SET 
country = SUBSTRING_INDEX(country, "/", 1),
state = SUBSTRING_INDEX(country, "/", -1)
WHERE country <> "" AND state = ""

Did some slight dabbling, and it appears the better way to ensure that the existing data is protected is to only apply the change where state is empty and country is not empty. This prevents states already populated from being changed (even if that leaves a country with value of "United States/TX" when state has value of "TX" (or "CA" or "Bluegrass Home").

Also, my dabblings found that

SELECT SUBSTRING_INDEX("United States", "/", -1);
SELECT SUBSTRING_INDEX("United States", "/",  1);

will both return "United States" so either have some check that "/" exists (using index check of some kind) or have complicated logic that confirms that derived values for temp_country and temp_state are not equivalent.

I'm sure there is something more elegant and less error-prone. Still poking around for native functions. I think a set of domain rules would be good. Something like:

  1. country can contain "Country/State" OR "Country" OR ""
  2. state can contain "State" Or ""
  3. If state contains "" and country contains "Country/State", update country to just Country and Update state to State value in Country.
  4. If state contains "" and country contains "Country", leave both alone
  5. If state contains "State" and country contains "Country/State" > This is your predicament : Do you favor the "/State" value in country column, or "State" value in state column? If you favor leaving state as is, do you wipe out "/State" value in country column so that it's just "Country" or do you favor the value of "/State" in country column and wipe out existing "State" value in state column?

This will update the state where the state is currently empty and the country has Country/State:

UPDATE info
SET 
state = IF(SUBSTRING_INDEX(country, "/", -1) = country, state, SUBSTRING_INDEX(country, "/", -1)),
country = IF(SUBSTRING_INDEX(country, "/", -1) = country, country, SUBSTRING_INDEX(country, "/", 1))
WHERE state = ""

It gets a bit more bold and froggy if you remove that state = "" condition, as it then forces you to add logic to determine if current value in state should hold or be overwritten by state from country field.

Anthony
  • 36,459
  • 25
  • 97
  • 163
  • As your answer helped me solving this, I mark your answer as the solution. The solution was: _UPDATE info SET state = SUBSTRING_INDEX( country, "/", -1 ), country = SUBSTRING_INDEX( country, "/", 1 ) WHERE country LIKE "%%/%%" AND state = ""_. Note, that the state and the country fileds needs to be in inverse order: first the state field updated than the country field. Thanks the help! – Zoli Jun 16 '15 at 09:53
  • 1
    Indeed, just discovered the importance of the updates as well. Adding My last attempt for you to compare. – Anthony Jun 16 '15 at 09:55
1

You can try this function:

SUBSTRING_INDEX(country, '/', 1) -- returns country i.e. before slash
SUBSTRING_INDEX(country, '/', -1) -- returns state i.e. after slash
navilink
  • 190
  • 1
  • 2
  • 14
1

Try this:

 update info
    set 
    state = case when locate('/', country) = 0 then state
    else substr(country,locate('/', country)+1) end,
    country = case when locate('/', country) = 0 then country
    else substr(country,1,locate('/', country)-1) end

Since you can have at most one slash, this simply splits the string at the slash and updates the respective columns.

Demo

It is however important to fix this issue at the application level, since the app is not respecting the table structure when it inserts data.

shree.pat18
  • 21,449
  • 3
  • 43
  • 63
0

This database is not normalized. You'd be better off if you broke those into two separate columns.

This implementation is misleading: it's not a country; it's a country/state.

You can't easily have country or state in a WHERE or ORDER BY clause.

I see no redeeming value in doing things this way, except "I've already got 4.5M rows" and "I'd have to change all the apps that use this database".

Change it now before you have 9M rows. You can adapt something like this for your purposes.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • 2
    I think ( not sure) that the OPs goal is to normalize the existing data. – Anthony Jun 16 '15 at 09:03
  • As I mentioned in my question, exactly that is my purpose. The data I have in the database is from an external source (from CSV file with LOAD DATA procedure), and I want to normalize it as much as possible. – Zoli Jun 16 '15 at 09:34
  • if it's from an external source, you can manipulate it prior to loading by using any programming language you wish. – duffymo Jun 16 '15 at 09:45
  • You are right, but I already have this in the db, I need to chang it now. Thanks anyway. – Zoli Jun 16 '15 at 10:00
  • If the data's in that file to be loaded, I'd drop the schema, recreate it, manipulate the data, and load it correctly. One more thing: should be CA, not Ca. Use standard state codes. – duffymo Jun 16 '15 at 10:02
  • I have a few hundred's of CSV files, from a few KB to 200-300 MB in size. I tought it would be much faster to do this directly with the data already in the table. – Zoli Jun 16 '15 at 10:06
  • You might be correct. Looks like you have a solution to try. – duffymo Jun 16 '15 at 12:55