1

Firstly, let me apologize for the html table. It's the only way I could make this skewed table look good.

Now as for the question, I'm trying to find the best way to save an array of coordinates (Coordinates column in table) in a column of a csv file for later use.

At the moment, after reading the CSV using dask like so: dd.read_csv(), the list is a string representation like so: '[[...][...]..]]' on which I use ast.literal_eval to convert the string representation of the list to a list but this is inefficient if I can find the right method to save the array.

In conclusion:
How to save numpy array in CSV column for easiest reload as numpy array later? OR How to extract numpy array from CSV column?

Edit:
I'm using multiprocessing so each entry in the below table is first saved as a list of lists then after all the iterations the list is converted to a dataframe which is saved as a CSV file

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

<head>
  <meta http-equiv=Content-Type content="text/html; charset=windows-1252">
  <meta name=ProgId content=Excel.Sheet>
  <meta name=Generator content="Microsoft Excel 15">
  <link rel=File-List href="monte_carlo%20Jun%2013,%202019%20@%2008_files/filelist.xml">
  <style id="monte_carlo Jun 13, 2019 @ 08.14__21583_Styles">
    <!--table {
      mso-displayed-decimal-separator: "\.";
      mso-displayed-thousand-separator: " ";
    }
    
    .xl1521583 {
      padding-top: 1px;
      padding-right: 1px;
      padding-left: 1px;
      mso-ignore: padding;
      color: black;
      font-size: 11.0pt;
      font-weight: 400;
      font-style: normal;
      text-decoration: none;
      font-family: Calibri, sans-serif;
      mso-font-charset: 0;
      mso-number-format: General;
      text-align: general;
      vertical-align: bottom;
      mso-background-source: auto;
      mso-pattern: auto;
      white-space: nowrap;
    }
    
    .xl6321583 {
      padding-top: 1px;
      padding-right: 1px;
      padding-left: 1px;
      mso-ignore: padding;
      color: black;
      font-size: 11.0pt;
      font-weight: 400;
      font-style: normal;
      text-decoration: none;
      font-family: Calibri, sans-serif;
      mso-font-charset: 0;
      mso-number-format: General;
      text-align: center;
      vertical-align: middle;
      mso-background-source: auto;
      mso-pattern: auto;
      white-space: nowrap;
    }
    
    .xl6421583 {
      padding-top: 1px;
      padding-right: 1px;
      padding-left: 1px;
      mso-ignore: padding;
      color: black;
      font-size: 11.0pt;
      font-weight: 400;
      font-style: normal;
      text-decoration: none;
      font-family: Calibri, sans-serif;
      mso-font-charset: 0;
      mso-number-format: General;
      text-align: left;
      vertical-align: middle;
      mso-background-source: auto;
      mso-pattern: auto;
      white-space: normal;
    }
    
    -->
  </style>
</head>

<body>
  <!--[if !excel]>&nbsp;&nbsp;<![endif]-->
  <!--The following information was generated by Microsoft Excel's Publish as Web
Page wizard.-->
  <!--If the same item is republished from Excel, all information between the DIV
tags will be replaced.-->
  <!----------------------------->
  <!--START OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD -->
  <!----------------------------->

  <div id="monte_carlo Jun 13, 2019 @ 08.14__21583" align=center x:publishsource="Excel">

    <table border=0 cellpadding=0 cellspacing=0 width=2451 style='border-collapse:
 collapse;table-layout:fixed;width:1841pt'>
      <col width=121 style='mso-width-source:userset;mso-width-alt:4425;width:91pt'>
      <col width=94 style='mso-width-source:userset;mso-width-alt:3437;width:71pt'>
      <col width=84 style='mso-width-source:userset;mso-width-alt:3072;width:63pt'>
      <col width=416 style='mso-width-source:userset;mso-width-alt:15213;width:312pt'>
      <col width=49 span=3 style='mso-width-source:userset;mso-width-alt:1792;
 width:37pt'>
      <col width=130 span=2 style='mso-width-source:userset;mso-width-alt:4754;
 width:98pt'>
      <col width=128 style='mso-width-source:userset;mso-width-alt:4681;width:96pt'>
      <col width=171 span=2 style='mso-width-source:userset;mso-width-alt:6253;
 width:128pt'>
      <col width=169 style='mso-width-source:userset;mso-width-alt:6180;width:127pt'>
      <col width=84 style='mso-width-source:userset;mso-width-alt:3072;width:63pt'>
      <col width=606 style='mso-width-source:userset;mso-width-alt:22162;width:455pt'>
      <tr height=20 style='height:15.0pt'>
        <td height=20 class=xl1521583 width=121 style='height:15.0pt;width:91pt'>L-string</td>
        <td class=xl1521583 width=94 style='width:71pt'>Coordinates</td>
        <td class=xl1521583 width=84 style='width:63pt'>Area</td>
        <td class=xl1521583 width=416 style='width:312pt'>Bounding Coordinates</td>
        <td class=xl1521583 width=49 style='width:37pt'>% of F</td>
        <td class=xl1521583 width=49 style='width:37pt'>% of +</td>
        <td class=xl1521583 width=49 style='width:37pt'>% of -</td>
        <td class=xl1521583 width=130 style='width:98pt'>Longest F sequence</td>
        <td class=xl1521583 width=130 style='width:98pt'>Longest + sequence</td>
        <td class=xl1521583 width=128 style='width:96pt'>Longest - sequence</td>
        <td class=xl1521583 width=171 style='width:128pt'>Average chars between Fs</td>
        <td class=xl1521583 width=171 style='width:128pt'>Average chars between +s</td>
        <td class=xl1521583 width=169 style='width:127pt'>Average chars between -s</td>
        <td class=xl1521583 width=84 style='width:63pt'>Angle</td>
        <td class=xl1521583 width=606 style='width:455pt'>Rules</td>
      </tr>
      <tr height=120 style='height:90.0pt'>
        <td height=120 class=xl6321583 style='height:90.0pt'>F---++F-F-+F-F-+</td>
        <td class=xl6421583 width=94 style='width:71pt'>[[ 0. 0. 0.]<br>
          <span style='mso-spacerun:yes'> </span>[ 0. 1. 0.]<br>
          <span style='mso-spacerun:yes'> </span>[ 0.4 1.9 0.]<br>
          <span style='mso-spacerun:yes'> </span>[-0.4 1.4 0.]<br>
          <span style='mso-spacerun:yes'> </span>[ 0.03 0.6 0.]<br>
          <span style='mso-spacerun:yes'> </span>[ 0.1 1.6 0.]]</td>
        <td class=xl6321583>3.531154679</td>
        <td class=xl6321583>(-0.4909271959019015, 0.0, 0.4226182617406995, 1.90630778703665)
        </td>
        <td class=xl6321583>0.3125</td>
        <td class=xl6321583>0.25</td>
        <td class=xl6321583>0.4375</td>
        <td class=xl6321583>1</td>
        <td class=xl6321583>2</td>
        <td class=xl6321583>3</td>
        <td class=xl6321583>2.25</td>
        <td class=xl6321583>2.666666667</td>
        <td class=xl6321583>1.166666667</td>
        <td class=xl6321583>0.715584993</td>
        <td class=xl6321583>{'X': {'options': ['---++X', 'F-F-+X'], 'probabilities': [0.24384925254542278, 0.7561507474545772]}}</td>
      </tr>
      <![if supportMisalignedColumns]>
      <tr height=0 style='display:none'>
        <td width=121 style='width:91pt'></td>
        <td width=94 style='width:71pt'></td>
        <td width=84 style='width:63pt'></td>
        <td width=416 style='width:312pt'></td>
        <td width=49 style='width:37pt'></td>
        <td width=49 style='width:37pt'></td>
        <td width=49 style='width:37pt'></td>
        <td width=130 style='width:98pt'></td>
        <td width=130 style='width:98pt'></td>
        <td width=128 style='width:96pt'></td>
        <td width=171 style='width:128pt'></td>
        <td width=171 style='width:128pt'></td>
        <td width=169 style='width:127pt'></td>
        <td width=84 style='width:63pt'></td>
        <td width=606 style='width:455pt'></td>
      </tr>
      <![endif]>
    </table>

  </div>


  <!----------------------------->
  <!--END OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD-->
  <!----------------------------->
</body>

</html>
Izak Joubert
  • 906
  • 11
  • 29

1 Answers1

0

I would store the column as a separate npy file.

I would use at save time:

df.assign(Coordinates=np.nan).to_csv(...)   # save all columns except coordinates to a csv file
np.save('... .npy', df['Coordinates'].values) # save coordinates as a npy file

Then at load time:

df = pd.read_csv(...)
df['coordinates'] = np.load('... .npy')

As pandas internally uses numpy ndarrays for its columns, this allows to directly save and load an object numpy arrays containing other numpy arrays.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Thanks Serge. Although it does seem tedious to have to do a save of the dataframe and the array... Maybe I should pickle the dataframe instead of saving to csv? – Izak Joubert Jun 13 '19 at 09:12
  • **You** asked for a csv ;-) . More seriously, numpy ndarrays are pickable, so it is probably a nice solution both simple and reasonably efficient. – Serge Ballesta Jun 13 '19 at 09:30
  • Yes apologies. I thought of pickling only after asking the question. But I'll be pickling a dataframe containing a numpy array but dataframes are also picklable so it should be fine. Thanks for the attention. I'm going to mark your answer for this question but I'm not going to use it anymore – Izak Joubert Jun 13 '19 at 09:39