0

I have a situation like this, but with much more columns and rows:

TAG WEIGHT SIRE   DAM     TAG WEIGHT SIRE   DAM     TAG WEIGHT  SIRE
1    30    PETER  LINDA    7    25   12     SHERRY  12  25      CHARLIE
2    32    JOHN   DINDA    8    28   KIM    TERE    26  34      JOSH
3    28    PETER  PRINCESS 26   30   PETER  BECK    27  30      PETER
4    25    JOSH   FANTA    28   14   FREDDY LENNY   NA  29.6    NA
5    26    SENIOR TINA     11   16   DANTON 45      NA  NA      NA
6    28    BILL   FRIDA    NA   22.6 NA     NA      NA  NA      NA
NA  28.16  NA     NA       NA   NA   NA     NA      NA  NA      NA

And I would like to separate in three files based on the name of the columns, like this:

File 1

TAG WEIGHT  SIRE    DAM
1   30      PETER   LINDA
2   32      JOHN    DINDA
3   28      PETER   PRINCESS
4   25      JOSH    FANTA
5   26      SENIOR  TINA
6   28      BILL    FRIDA

File 2

   TAG  WEIGHT  SIRE    DAM
    7     25    12      SHERRY
    8     28    KIM     TERE
    26    30    PETER   BECK
    28    14    FREDDY  LENNY
    11    16    DANTON  45

File 3

TAG WEIGHT  SIRE    DAM
12  25      CHARLIE NA
26  34      JOSH    NA
27  30      PETER   NA

So, there is the mean for the weight on the last row for each file, and this information is unnecessary, and some situation there are not all columns (File 3) and so I would like to insert the column name with NA.

Curious G.
  • 838
  • 8
  • 23
  • How is the file stored? If you use `read.csv()` or `readr::read_csv()` or many other functions. Most of them will automatically repair the column names by adding suffixes. – yusuzech Oct 09 '19 at 00:50
  • I would like to create new files, separating this columns. It's a xlsx file. – Curious G. Oct 09 '19 at 00:54
  • Since you want to do this in R, please read the file in R and update your post with `dput(dataframe_name)`. – Ronak Shah Oct 09 '19 at 01:03
  • Sorry @RonakShah, but I didn't understand! Why I need to do this? I'm using the logic. My original file is very big. If solve this, I could to solve my real challenge. – Curious G. Oct 09 '19 at 01:11
  • 1
    CuriosG, I effectively solved this for you in your [previous question](https://stackoverflow.com/a/58289915), though the differentiation there was on cutting *rows*, not *columns* as you're asking for here. Do something similar: find where each batch of columns starts, and re-read those columns. – r2evans Oct 09 '19 at 01:13

1 Answers1

1

if Tag is the defining separator, you could read the first line of the file and find the positions of "Tag", then it's just a matter of looping through the entire CSV file and sort them into different arrays before storing them into separated files.

Here's a quick PHP script. I have not tested it, but this should give you a general idea.

// assuming you have the CSV file already read into an array
// https://www.php.net/fgetcsv
// https://www.php.net/manual/en/function.str-getcsv.php
$csv = str_getcsv(file_get_contents('original-file.csv'));

// get the first row and find all the array position of "Tag"
$header = $csv[0];
$splitKeys = array_keys($header, 'Tag');

$files = [];

// loop through the csv
foreach ($csv as $row) {
  // loop through the split keys
  foreach ($splitKeys as $index => $splitKey) {
    // creates a container for all the found rows
    $fileHash = md5($splitKey);
    if (!isset($files[$fileHash])) $files[$fileHash] = [];

    // find the distance between current split key (pos) and next split key
    // if it's the last split key, then it's the distance between that till the end of the row
    if ($index == count($splitKeys) - 1) {
      $length = count($row) - 1 - $splitKey;
    } else {
      $length = $splitKeys[$index + 1] - $splitKey;
    }

    // use array slice
    $files[$fileHash] = array_slice($row, $splitKey, $length);
  }
}

// now just loop through the files array and put into csv files
// https://www.php.net/manual/en/function.fputcsv.php
foreach ($files as $fileHash => $file) {
  $fp = fopen($fileHash.'.csv', 'w');
  foreach ($file as $row) {
    fputcsv($fp, $row);
  }
  fclose($fp);
}
epicgear
  • 96
  • 1
  • 4