I'm attempting to simulate the use of pandas to access a constantly changing file.
I have one file reading a csv file, adding a line to it then sleeping for a random time to simulate bulk input.
import pandas as pd
from time import sleep
import random
df2 = pd.DataFrame(data = [['test','trial']], index=None)
while True:
df = pd.read_csv('data.csv', header=None)
df.append(df2)
df.to_csv('data.csv', index=False)
sleep(random.uniform(0.025,0.3))
The second file is checking for change in data by outputting the shape of the dataframe:
import pandas as pd
while True:
df = pd.read_csv('data.csv', header=None, names=['Name','DATE'])
print(df.shape)
The problem with that is while I'm getting the correct shape of the DF, there are certain times where it's outputting (0x2)
.
i.e.:
...
(10x2)
(10x2)
...
(10x2)
(0x2)
(11x2)
(11x2)
...
This does occur at some but not between each change in shape (the file adding to dataframe).
Knowing this happens when the first script is opening the file to add data, and the second script is unable to access it, hence (0x2), will this occur any data loss?
I cannot directly access the stream, only the output file. Or are there any other possible solutions?
Edit
The purpose of this is to load the new data only (I have a code that does that) and do analysis "on the fly". Some of the analysis will include output/sec, graphing (similar to stream plot), and few other numerical calculations.
The biggest issue is that I have access to the csv file only, and I need to be able to analyze the data as it comes without loss or delay.