I am working with a data set with a combined 300 million rows, split over 5 csv files. The data contains weight measurements of users over 5 years (one file per year). As calculations take ages in this massive data set, I would like to work with a subset of users to create the code. I've used the nrows function to import only the first 50000 lines of each file. However, one user may have 400 weight measurements in the file for year 2014 but only 240 in year 2015. I therefore don't get the same set of users from each file when I import with the nrows function. I am wondering whether there is a way to import the data of the first 1000 users in each file? The data looks like this in all files:
user_ID date_local weight_kg
0002a3e897bd47a575a720b84aad6e01632d2069 2016-01-07 99.2
0002a3e897bd47a575a720b84aad6e01632d2069 2016-02-08 99.6
0002a3e897bd47a575a720b84aad6e01632d2069 2016-02-10 99.5
000115ff92b4f18452df4a1e5806d4dd771de64c 2016-03-13 99.1
000115ff92b4f18452df4a1e5806d4dd771de64c 2016-04-20 78.2
000115ff92b4f18452df4a1e5806d4dd771de64c 2016-05-02 78.3
000115ff92b4f18452df4a1e5806d4dd771de64c 2016-05-07 78.9
0002b526e65ecdd01f3a373988e63a44d034c5d4 2016-08-15 82.1
0002b526e65ecdd01f3a373988e63a44d034c5d4 2016-08-22 82.6
Thanks a lot in advance!