0

I have a database that stores duplicates of everything and we need to be able to extract only 1 copy of this. the table is:

+----+------------+-----------+-------------+
| id | locationId | name      | description |
+----+------------+-----------+-------------+
|  1 |          1 | rgh       | lol         |
|  2 |          1 | rgh       | lol         |
|  3 |          2 | tbh       | gjbh        |
|  4 |          2 | tbh       | gjbh        |
|  5 |          3 | lolcbj999 | abububh     |
|  6 |          3 | lolcbj999 | abububh     |
|  7 |          4 | test      | vhjj        |
|  8 |          4 | test      | vhjj        |
|  9 |          5 | ghh       | yhhh        |
| 10 |          5 | ghh       | yhhh        |
+----+------------+-----------+-------------+

We tried using the %2 ==0 method but the problem is, notice how locationId is a foreign key to this table

+----+--------+-----------+-----------+-----------+
| id | walkId | longitude | latitude  | timestamp |
+----+--------+-----------+-----------+-----------+
|  1 |      1 | -4.067431 | 52.416904 |         2 |
|  2 |      2 | -4.067398 | 52.416908 |         0 |
|  3 |      3 | -4.067419 | 52.416904 |         3 |
|  4 |      4 | -4.067482 | 52.416897 |         1 |
|  5 |      5 | -4.067458 | 52.416897 |         1 |
+----+--------+-----------+-----------+-----------+

We want to display this contents in an XML file, so how can we go about doing this? based on the fact. We cannot seem to grasp this, we can return all the odds so wlak id 1,3,5 but miss on 2 and 4 and vise versa. Any ideas?

We'd prefer it if there was a way to stop records being added twice if so using this database php code what do we need to change:

<?php
/**
 * This file contains database related functions.
 *
 * @file
 */

# prevent users from accessing this page via their browser
if ( !defined( 'ENTRYPOINT' ) ) {
  die( 'This is not a valid entry point.' );
}

require dirname( __FILE__ ) . '/../config.php';
/**
 * This function runs the query that will attempt to create the correct tables
 * in the database.
 *
 * @return a string detailing errors encountered, or FALSE if the query worked.
 */
function createDatabase() {
  $db = openConnection();

  $sql = <<<'SQL'
  CREATE TABLE tbl_routes (
    id int NOT NULL AUTO_INCREMENT, title varchar(255), shortDesc varchar(255), longDesc varchar(255),
    hours float(12), distance float(12), PRIMARY KEY (id));
  CREATE TABLE tbl_locations (
     id int NOT NULL AUTO_INCREMENT, walkId int NOT NULL, latitude float(12), longitude float(12), timestamp float(12),
     PRIMARY KEY (id), FOREIGN KEY (walkId) REFERENCES tbl_routes(id));
   CREATE TABLE tbl_places (
     id int NOT NULL AUTO_INCREMENT, locationId int NOT NULL, description varchar(255),
     PRIMARY KEY (id), FOREIGN KEY (locationId) REFERENCES tbl_locations(id));
  CREATE TABLE tbl_images (
    id int NOT NULL AUTO_INCREMENT, placeId int NOT NULL, photoName varchar(255),
    PRIMARY KEY (id), FOREIGN KEY (placeId) REFERENCES tbl_places(id));
SQL;

  $query = executeSql( $sql );
  if ( !is_string( $query ) ) {
    return FALSE;
  } else {
    return $query;
  }
}

/**
 * This function inputs a walk into the database.
 *
 * @param a walk object, as created by the upload.php function
 * @return a string describing the error, or SUCCESS if things worked
 */
function inputWalk( $walk ) {
  # TODO
  # we need to escape the inputs to proof against SQL injections
  # see rfc 4389 2.5.2
  $values = [
    "NULL", # ID is assigned by the database thanks to AUTO_INCREMENT
    "'$walk->title'",
    "'$walk->shortDesc'",
    "'$walk->longDesc'",
    "NULL",
    "NULL"
    ];

  $sql = 'INSERT INTO tbl_routes VALUES (' . implode( $values, ',' ) . ');';

  $db = openConnection();
  $query = mysqli_query( $db, $sql );
  if ( is_bool( $query ) && $query === TRUE ) {
    # the SQL query worked and the data is stored
    # we need to request the data back to see what the ID has been set to
    $walkId = mysqli_insert_id( $db );
    foreach ( $walk -> locations as &$location ) {
      $query = null;
      $values = null;
      $values = [
        "NULL", # location ID
        $walkId,
        "'" . mysqli_real_escape_string( $db, $location->latitude ) . "'",
        "'" . mysqli_real_escape_string( $db, $location->longitude ) . "'",
        "'" . mysqli_real_escape_string( $db, $location->timestamp ) . "'"
      ];
      $query = "INSERT INTO tbl_locations VALUES (" . implode( $values, ',' ) . ");";
      $sql =  mysqli_query( $db, $query );
      if ( $sql === FALSE ) { return $query; }
      # now need to get the locationId
      $locID = mysqli_insert_id( $db );

      foreach ( $location -> descriptions as &$description ) {
        $description[0] = mysqli_real_escape_string( $db, $description[0] );
        $description[1] = mysqli_real_escape_string( $db, $description[1] );
        # TODO: BUG: doesn't actually work...
        $sql = "INSERT INTO tbl_places VALUES (NULL, '$locID', '$description[0]', '$description[1]');";
        $query = mysqli_query( $db, $sql );
        if ( $query === FALSE ) { return mysqli_error( $db ); }
      }


      foreach ( $location -> images as &$image ) {
        $query = "INSERT INTO tbl_images VALUES (NULL, $locID, '$image[0]');";
        mysqli_query( $db, $sql );
      }
    }

    # everything worked
    return 'SUCCESS'; # TODO: make this less hackish
  } else {
    return $query;
  }

}

/**
 * This function opens a connection to the database.
 * @return an active database connection.
 *
 * Use closeConnection() to close it again.
 */
function openConnection() {
  global $DB_ADDR, $DB_USER, $DB_PASS, $DB_NAME;
  $db = mysqli_connect( $DB_ADDR, $DB_USER, $DB_PASS, $DB_NAME );
  if ( !$db ) {
    # connection failed
    die( 'Could not connect to MySQL' );
  }
  return $db;
}

/**
 * Executes an SQL statement.
 * @param the SQL to execute
 * @return the result of the action
 *
 * On success it will return a mysqli_result() object when the SQL statement is
 * a SELECT, SHOW, DESCRIBE or EXPLAIN. Other successes will return TRUE.
 *
 * Failures will return a string explaining the error. Use is_object() to
 * discern between the error string and a result.
 */
function executeSql( $sql ) {
  $db = openConnection();
  $query = mysqli_query( $db, $sql );
  if ( $query === FALSE ) {
    $query = mysqli_error( $db );
  }
  closeDatabase( $db );
  return $query;
}

/**
 * Closes an active database connection.
 * @param the database connection to close.
 */
function closeDatabase( $db ) {
  mysqli_close( $db );
}

/**
 * Fetches the first field of the first row of a result
 * @param The result to fetch
 * @return the first field of the first row
 * @deprecated
 */
function fetchID( mysqli_result $result ) {
  return $result->fetch_row()[0];
}

Cheers, Chris.

chris edwards
  • 1,332
  • 4
  • 13
  • 31
  • [look at the rownumber per group question & answer](http://stackoverflow.com/questions/17939198/row-number-per-group-in-mysql), you can put that in a subquery and only select the rownumber=1 variants. – Wrikken Jan 29 '14 at 15:59
  • Group by locationId, name, and description. It might seem logical to pull every other row, but what happens when duplicate rows aren't right next to each other? A subquery might work better if you have a large number of rows. You'll need to test to see which is quicker. – Dave Jan 29 '14 at 16:02
  • GROUP BY would be better, you'd be making a re-select for *every* row with a subquery, which is pretty inefficient. – MackieeE Jan 29 '14 at 16:03
  • Still cant seem to get this going, is there an example of the subquery you mention? our web guy has left so we are sort of on our own at the moment. – chris edwards Jan 29 '14 at 16:06
  • @MackieeE: that depends on how you use a subquery... a `SELECT * FROM (some subquery) sub WHERE sub.rownumber=1` would only run the subquery _once_ of course. – Wrikken Jan 29 '14 at 16:08
  • Of course, if you only _want_ one in the table, and you don't care which one you discard, you can `ALTER IGNORE tbl_places ADD UNIQUE(locationId);`. Note that if there is a bug now that enters them twice, the second query will fail with an error, unless you use an `INSERT IGNORE` (ignore new data) or `INSERT... ON DUPLICATE KEY ..` (update to new data) syntax. Probably you don't want more then 1 entry in `$location->descriptions`, so instead of a `foreach`, you could just fetch the first item & store that one, discarding the rest. – Wrikken Jan 29 '14 at 16:11

1 Answers1

0

You could use group by:

select min(id), locationId, name, description 
from table t
group by locationId, name, description;

That might not be so efficient on a large table. Instead, you could create an index on table(locationId, name, description, id) and do:

select id, locationId, name, description
from table t
where not exists (select 1
                  from table t2
                  where t2.locationId = t.locationId and t2.name = t.name and
                        t2.description = t.description and t2.id < t.id
                 );

This will select the first from each group and might be rather efficient with the above mentioned index.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786