0

I have a TXT file with no punctuation between them. I would like to shred this file by the table column widths in the database and save it.

Let me tell you this step by step…

  1. I’m creating a table in the database with my tabloolustur.php page. The column numbers and column widths of the tables I create will not be the same.

  2. There are no punctuation marks between the TXT file data. First, I want to split the TXT file rows by column width.

    $result = $baglanti->prepare("SHOW COLUMNS FROM customers where Field NOT IN('id')");
    
    $result->execute();
    $colcount = $result->columnCount()-1;
    
    $columLen = array();
    
    foreach($result as $key => $col){
    
       preg_match('/\d+/', $col['Type'], $len);
       $len = (isset($len[0]))? $len[0] : '';  
       $fieldname = $col['Field'];            
       $columLen[$fieldname]  = $len;  
    
    }
    

    For this, I get the number of columns and column widths with the code.

  3. Then, I separate the data with commas with the following function.

    function txtBol($metin, $genislik){
        $parcala=array();
    
        foreach ($genislik  as $sutunadi => $lenght)
        {
            $parcala[$sutunadi] = substr($metin, 0, $lenght);
            $metin = substr($metin, $lenght);
    
        }
    return $parcala;
    }
    
  4. I also get column names with the following code. (ps: to use in a query)

    $KolAdi = $baglanti->query("SHOW COLUMNS FROM customers  where Field NOT IN('id')");
    $KolAdi->execute();
    $colonAdi= $KolAdi->fetchAll(PDO::FETCH_COLUMN);
    $colonAdi=implode(',', $colonAdi);
    

    It prints the data i split correctly when printing it to the screen. So far, so good. But I can’t create the right query with PDO. How should I create the query? (ps: Table column names and column widths are not the same. There will be different numbers and width columns for each table) I would appreciate it if you could help. I proceeded by implementing some solutions from your site.

Table:

id name cev1 cev2 cev3
1 MARTIN EDEN AAAAAA BBBBB CCCC

txt:

MARTIN EDEN........AAAAAABBBBBDDDD
cursorrux
  • 1,382
  • 4
  • 9
  • 20

2 Answers2

0

Assuming a valid PDO connection in $PDO, you can do the whole job like this. It reads the column data once and uses that to create a query to prepare and a regular expression to extract the data.

$table = 'customer';
$txtFile = 'cust.txt';

// No need to prepare this since there's no untrusted data here.
$result = $pdo->query("show columns from `$table` where Field <> 'id'");

// Get column names and widths in arrays
while($columns = $result->fetch(PDO::FETCH_ASSOC)) {
    $colNames[] = $columns['Field'];
    preg_match('/\d+/', $columns['Type'], $len);
    $colLengths[] = $len[0]??'';
}
// Create regex to extract data from lines:
// Looks like this: /(.{20})(.{10})(.{5})/
$regex = '/(.{'.implode('})(.{',$colLengths).'})/u';

// Create skeleton query with table name and field names and placeholders
// Looks like this: INSERT customer (name,addr1,postcode) VALUES (?,?,?)
$query = "INSERT $table (`".implode('`,`', $colNames).'`) VALUES ('.str_repeat('?,', count($colNames)-1).'?)';

// Read text file
if ($fileData = file($txtFile)) {
    // Prepare the query only once
    $stmt = $pdo->prepare($query);
    foreach ($fileData as $row) {
        // Get the data using the regex from above
        preg_match($regex, $row, $rowData);
        // Remove the first row of the regex matches - see PHP manual for why
        array_shift($rowData);
        // Now execute the prepared query using the data extracted by the regex.
        $stmt->execute($rowData);
    }
}

Note that this codes assumes that the table consists of a series of contiguous columns from which it can extract column widths, and no other columns except id, which it ignores. If your actual table structure is different you'll need to modify the SHOW COLUMNS query to omit those columns, or modify the code that extracts the column data to extract only the relevant columns.

{Edit] Updated the code to include /u UTF-8 modifier on the regex, and to wrap the column names in the INSERT query in back ticks.

  • Thank you very much. It's exactly how I wanted it. – martin eden Sep 19 '21 at 20:05
  • There is a small problem. Although the charset is utf8, when saving to the database, the letters İ,ı,Ö,ö,Ş,ş,Ü,ü,Ğ,ğ, '?' saves as. how can i solve this problem? – martin eden Sep 19 '21 at 21:25
  • Adding the UTF-8 modifier to the end of the regex should work (`$regex = '/(.{'.implode('})(.{',$colLengths).'})/u';`. This works on my system, but I couldn't make it display the `?` without the modifier. I use a UTF8 character set on my database by default. You should check that you have a UTF8 character set on your database. – Tangentially Perpendicular Sep 19 '21 at 22:33
  • Dear Tangential Perpendicular Solution I tried your suggestion but it didn't work on my system. it gave the following error. Fatal error: Uncaught PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; – martin eden Sep 20 '21 at 18:27
  • @Martineden Adding the `/u` modifier won't affect the SQL syntax. Most likely you have a column name in your table that clashes with a MySQL keyword. I have updated the code above to wrap column names in backticks that should deal with this. Note that I run this on my development system with a limited set of test data. I am not in a position to debug this thoroughly for your environment without a comprehensive set of test data and tables. That's a different task and [so] is not the place. – Tangentially Perpendicular Sep 20 '21 at 20:04
  • Dear @Tangential Perpendicular I changed the table column names, I tried the updated code but I got the same error. You are also right in what you said. Thank you very much for your help. you are great – martin eden Sep 20 '21 at 20:26
  • dear @Tangentially Perpendicular. when you print the first line to the database, the first character is '?' it prints. on the first line, the sorting is broken. But it accurately records the next lines. when I print to the screen iimage: https://www.hizliresim.com/shc1c5k When I print to the database image: https://www.hizliresim.com/tjx74n8 – martin eden Sep 23 '21 at 21:16
  • It looks like your file has a [Byte Order Mark](https://en.wikipedia.org/wiki/Byte_order_mark). UTF-8 files permit it but don't require it. You can remove it with Notepad++ or other tools (See [this question](https://stackoverflow.com/q/32986445/14853083)), or you can modify the code above to remove it from the first line before you process the file (See [this question](https://stackoverflow.com/q/28248518/14853083)) – Tangentially Perpendicular Sep 24 '21 at 01:12
  • dear @Tangentially Perpendicular, your solution suggestion worked. I thank you for your help. You are great – martin eden Sep 24 '21 at 19:08
-1

you can create a table like this and you don't need to check a number of fields by rows let me know if this useful

CREATE TABLE new_table ( id INT NOT NULL AUTO_INCREMENT, row_id INT NULL DEFAULT 0, row_field_name VARCHAR(50) NULL DEFAULT NULL COMMENT 'Index of ', row_value VARCHAR(255) NULL DEFAULT NULL, PRIMARY KEY (id));

mmantach
  • 148
  • 10