1

I'm reading a csv file being writted by and instrument, i'm reading the data into a dataframe, performing a little bit of cleaning and setting a datetime index an then i query the data into a postgreSQL table. I cant find an efficient way of only querying the new data.

I've been thinking in several ways but cant concrete any of them. I thought of querying the last row looking for in the dataframe and the slice there. Other idea was to do it postgresql side and query the whole dataframe and find a query that do a negative intersection or an even simpler one as getting the len of the dataframe and passing it as staring row as a parameter some how.

    def readerCCN(self):
        with open(self.file_path, 'r') as file:
            path = file.next()
            date = file.next().strip().split(',')
            time = file.next().strip().split(',')
            timestamp = pd.Timestamp(date[1] + ' ' + time[1])

        with open(self.file_path, 'r') as file:
            dataframe = pd.read_csv(file, header=3, skip_blank_lines=True, comment="!",
                                    keep_default_na=True) # read csv

        dataframe.columns = dataframe.columns.str.strip()  # Fix Error in headers
        dataframe = dataframe.set_index('Time')
        dataframe = dataframe.groupby('Time').mean()
        dataframe.index = pd.date_range(timestamp, freq='s', periods=len(dataframe))
        dataframe.index.names = ['Datetime']
        #last_row = self.sql.table_last_row('Raw Data', 'ccn')

        self.dataframe = dataframe
        self.sql.table_entry(dataframe, 'Raw Data', 'ccn')

This is how my data looks:

                     Current SS  Temps Stabilized  Delta T     T1 Set    T1 Read     T2 Set    T2 Read     T3 Set    T3 Read  Nafion Set   T Nafion  Inlet Set    T Inlet    OPC Set      T OPC   T Sample  Sample Flow  Sheath Flow  Sample Pressure  Laser Current  overflow  Baseline Mon  1st Stage Mon  Bin #  Bin 1  Bin 2  Bin 3   Bin 4  Bin 5  Bin 6  Bin 7  Bin 8  Bin 9  Bin 10  Bin 11  Bin 12  Bin 13  Bin 14  Bin 15  Bin 16  Bin 17  Bin 18  Bin 19  Bin 20  CCN Number Conc  Valve Set  Alarm Code  Alarm Sum
Datetime                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
2019-05-02 00:00:00         0.1               1.0    3.264  27.709753  27.683804  29.341753  29.099846  30.745274  30.540390   26.709753  26.765661  28.709753  28.766088  32.745274  32.870701  25.833349    45.147003   455.568115      1016.933960      78.315002       0.0      2.787700        0.56730    0.0    4.0    5.0   13.0   74.00    0.0   2.00   0.00   0.00    1.0    1.00     0.0    0.00     0.0    1.00     0.0     0.0     0.0     0.0     0.0     0.0       134.035156   2.840745         0.0        NaN
2019-05-02 00:00:01         0.1               1.0    3.264  27.709753  27.712530  29.341753  29.124981  30.745274  30.583268   26.709753  26.769858  28.709753  28.840857  32.745274  32.824543  25.821905    45.437126   458.366699      1016.158752      77.395000       0.0      2.789225        0.56730    0.0    1.0    8.0   28.0   92.25    3.5   1.75   1.25   0.75    1.0    0.25     2.0    0.25     0.0    0.25     0.0     0.0     0.0     0.0     0.0     0.0       111.956049   2.741862         0.0        NaN
2019-05-02 00:00:02         0.1               1.0    3.264  27.709753  27.726048  29.341753  29.148849  30.745274  30.581791   26.709753  26.774817  28.709753  28.824072  32.745274  32.797459  25.853186    45.054581   458.191650      1016.778931      78.257500       0.0      2.790140        0.56974    0.0    0.0    8.0   35.0   79.00    2.0   1.00   0.00   1.00    0.0    0.00     0.0    0.00     0.0    0.00     0.0     0.0     0.0     0.0     0.0     0.0       184.126541   2.800261         0.0        NaN
2019-05-02 00:00:03         0.1               1.0    3.264  27.709753  27.717598  29.341753  29.122658  30.745274  30.542925   26.709753  26.780920  28.709753  28.850012  32.745274  32.808140  25.815039    45.229172   457.666992      1016.951233      77.222504       0.0      2.790140        0.57096    0.0    0.0    3.0   30.0   78.00    2.0   0.00   1.00   1.00    0.0    1.00     0.0    1.00     0.0    0.00     0.0     0.0     0.0     0.0     0.0     0.0       153.451904   2.759360         0.0        NaN
2019-05-02 00:00:04         0.1               1.0    3.264  27.709753  27.706615  29.341753  29.122658  30.745274  30.542080   26.709753  26.788549  28.709753  28.790503  32.745274  32.817295  25.779943    44.972427   454.168701      1016.865051      77.107498       0.0      2.791360        0.57096    0.0    3.0    5.0   29.0  103.00    2.0   0.00   2.00   1.00    1.0    0.00     0.0    0.00     0.0    0.00     0.0     0.0     0.0     0.0     0.0     0.0       194.505920   2.812238         0.0        NaN
2019-05-02 00:00:05         0.1               1.0    3.264  27.709753  27.694786  29.341753  29.101536  30.745274  30.546305   26.709753  26.764135  28.709753  28.689795  32.745274  32.823399  25.845556    44.941620   459.765869      1016.916748      78.142502       0.0      2.791360        0.56974    0.0    1.0    8.0   30.0   83.00    2.0   3.00   0.00   0.00    0.0    1.00     1.0    0.00     0.0    0.00     0.0     0.0     0.0     0.0     0.0     0.0       173.712936   2.855573         0.0        NaN
2019-05-02 00:00:06         0.1               1.0    3.264  27.709753  27.694786  29.341753  29.086327  30.745274  30.539545   26.709753  26.811438  28.709753  28.758459  32.745274  32.763889  25.830297    45.475632   463.264160      1016.072632      78.372505       0.0      2.788920        0.57096    0.0    2.0    8.0   23.0   73.00    3.0   1.00   3.00   0.00    1.0    0.00     1.0    0.00     0.0    0.00     0.0     0.0     0.0     0.0     0.0     0.0       150.011292   2.797190         0.0        NaN
2019-05-02 00:00:07         0.1               1.0    3.264  27.709753  27.685493  29.341753  29.099846  30.745274  30.558132   26.709753  26.776342  28.709753  28.827124  32.745274  32.768467  25.828772    44.797852   454.168701      1016.865051      77.107498       0.0      2.790140        0.56608    0.0    1.0    6.0   43.0  101.00    4.0   4.00   1.00   1.00    2.0    2.00     2.0    0.00     0.0    0.00     0.0     0.0     0.0     0.0     0.0     0.0       223.349808   2.713870         0.0        NaN
2019-05-02 00:00:08         0.1               1.0    3.264  27.709753  27.704926  29.341753  29.109985  30.745274  30.568272   26.709753  26.762609  28.709753  28.770666  32.745274  32.769993  25.836401    45.136734   459.416016      1016.623901      77.222504       0.0      2.788920        0.56974    0.0    1.0    9.0   27.0   81.00    6.0   0.00   1.00   1.00    0.0    0.00     0.0    0.00     0.0    0.00     0.0     0.0     0.0     0.0     0.0     0.0       167.250229   2.809274         0.0        NaN
2019-05-02 00:00:09         0.1               1.0    3.264  27.709753  27.713375  29.341753  29.104071  30.745274  30.553909   26.709753  26.777868  28.709753  28.756933  32.745274  32.751682  25.821142    45.064850   455.043213      1016.778931      77.222504       0.0      2.791360        0.56608    0.0    1.0    5.0   25.0   66.00    3.0   0.00   0.00   0.00    0.0    0.00     0.0    0.00     1.0    0.00     0.0     0.0     0.0     0.0     0.0     0.0       135.635880   2.858126         0.0        NaN

I've would like to only upload the new data in the file, with out losign any data.

  • Might sound really simple but why can't you just add in a unique ID and set that as the primary key ? – Umar.H Aug 01 '19 at 18:59
  • @Datanovice well i can but i haven't really work too much with SQL. I'm slowly learning. What i'm doing is going to be a larga databate with a DAS getting intruments from 8+ instruments. How can that help me ? – Lemanuel Colon Aug 01 '19 at 22:18
  • I think we need to see your data to help, just code isn't helpful (hence the lack of responses) – Umar.H Aug 01 '19 at 22:20
  • Here is a *.csv of how my could would output a data file from my instrument this is at least one hour of data. [data](https://drive.google.com/file/d/115KElLhIf7EgsBN43NfFqKIQQSj-FkIl/view?usp=sharing) – Lemanuel Colon Aug 01 '19 at 22:30
  • are you able to post 10 rows of your data in the post above? post it as text and have a look at this answer : https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Umar.H Aug 01 '19 at 22:33
  • i will try, the data have 30+ columns – Lemanuel Colon Aug 01 '19 at 22:35
  • @Datanovice did my best, theres a well formated df.head[10] of how my data looks like. – Lemanuel Colon Aug 01 '19 at 22:54
  • 1
    I think I understand now you could try `from datetime import datetime` `today = datetime.today().strftime('%d-%m-%Y')` `df.loc[df.index > today]` this would only get data that you consider new (i.e today) is that what you were after? – Umar.H Aug 02 '19 at 10:42
  • @Datanovice this option actually its pretty clever, but i'm reading the data from another computer. This could mean that de-synchronization in computer times would lead to data loss? and program writes data in a new file every hour. I'm trying to do this real time, so i can query that data real-time too into a gui. Second if i the program start late to read the file it would lead to loose data too. – Lemanuel Colon Aug 02 '19 at 20:24

0 Answers0