0

I have dataframe (df) as below. This is the output of merging two DF's per hour. The NA's are those hours the data did not arrive into system at that hour.

Vol     Nu        Date       lat      long
0       NA  2017-01-01 01     NA       NA
2       NA  2017-01-01 02     NA       NA
0       NA  2017-01-01 03     NA       NA
0       NA  2017-01-01 07     NA       NA
0       NA  2017-01-01 08     NA       NA
0        2  2017-01-01 09  80.85243 26.78307
0       NA  2017-01-01 10     NA       NA
0       NA  2017-01-01 11     NA       NA
0        2  2017-01-02 01  80.90426 26.77535
2       NA  2017-01-02 02    NA       NA
3       NA  2017-01-02 03     NA       NA
0       NA  2017-01-02 04     NA       NA
0       NA  2017-01-02 05     NA       NA
0        2  2017-01-02 06  80.90426 26.77535
.
.
.
.

All I need to get the output as below, where depending on a particular day, I have to clone the data in any of the hour. Ex on 2017-01-01 09 hour, I have data, so I will expand it to rest of the hours of the same data and so on with other date 2017-01-02...

Expected Output

Vol     Nu        Date       lat      long
0        2  2017-01-01 01  80.85243 26.78307
2        2  2017-01-01 02  80.85243 26.78307
0        2  2017-01-01 03  80.85243 26.78307
0        2  2017-01-01 07  80.85243 26.78307
0        2  2017-01-01 08  80.85243 26.78307
0        2  2017-01-01 09  80.85243 26.78307
0        2  2017-01-01 10  80.85243 26.78307
0        2  2017-01-01 11  80.85243 26.78307
0        2  2017-01-02 01  80.90426 26.77535
2        1  2017-01-02 02  80.90426 26.77535
3        1  2017-01-02 03  80.90426 26.77535
0        1  2017-01-02 04  80.90426 26.77535
0        1  2017-01-02 05  80.90426 26.77535
0        1  2017-01-02 06  80.90426 26.77535

It would be very helpful if you can help me.

Note: the data set is pretty big... more than a million rows populated per day. I just showed few records above.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Adarsha Murthy
  • 145
  • 3
  • 13
  • What if you have data in more than one row in a day? Which one will you select? – Ronak Shah Jul 26 '18 at 07:07
  • hi Ronak Shah... any one would be fine.. since they become unique by row id and cluster number at the end which i did not mention here... just not to over provide the details.. but any one row would be fine.. as they will be same – Adarsha Murthy Jul 26 '18 at 07:23
  • Check if the marked duplicate solves your problem. You might have to tweak your data to use it. – Ronak Shah Jul 26 '18 at 07:28
  • @Ronak, actually it does not.. it talks ablout just one vector..i have around 50 variables that needs to be duplicated.. i cann't go one by one.. – Adarsha Murthy Jul 27 '18 at 05:44

0 Answers0