I have a txt. file that looks as follows:
Name 1
@Name( ) Value WATER WHP
Date Unit Unit Unit
-------------- ---------- ---------- ---------- ----------
Name 1 20081220 2900.00 0.00 3300.00
Name 1 20081221 0.00 0.00 3390.00
Name 1 20081222 2500.00 0.00 2802.00
Name 1 20081223 0.00 0.00 3022.00
Name 1 20081224 0.00 0.00 3022.00
I used the following code to import into python:
df = pd.read_csv(r'test_prd.txt', skiprows=6, engine="python", header=None)
df.columns = ['Test']
df.drop(df.tail(1).index, inplace = True) # because of file format
df = df.Test.str.split(expand=True)
df.rename(columns ={0:'Name', 1:'Number', 2:'Date', 3:'Value', 4:'Water', 5:'WHP'}
,inplace=True)
df['Date'] = pd.to_datetime(df['Date']).dt.floor('D').dt.strftime('%Y-%m-%d')
df['Note'] = (df['Value']).apply(lambda x: 'yes' if x==0 else '')
del df['Water']
del df['WHP']
df['Name'] = df['Name'].astype(str) + ' ' + df['Number'].astype(str)
del df['Number']
After using this code the data frame looks like:
Name Date Value Note
0 Name 1 2008-12-20 2900.00
1 Name 1 2008-12-21 0.00 Yes
2 Name 1 2008-12-22 2500.00
3 Name 1 2008-12-23 0.00 Yes
4 Name 1 2008-12-24 0.00 Yes
... ... ... ... ...
78 Name 2009-03-15 0.00 Yes
79 Name 2009-03-16 3000.00
80 Name 2009-03-17 0.00 Yes
... ... ... ... ...
I want to print the periods of time (start date - end date) for which the 'Value' column equals zero, i.e, when 'Note'=Yes. Any other row were the value is non-zero can be removed from the data frame. If there is a standalone value of zero (preceded and followed by a non-zero value), the start and end date would be the same.
The expected output should look like this:
Name Start Date End Date Value Note
1 Name 2008-12-21 2008-12-21 0.00 Yes
2 Name 2008-12-23 2009-03-15 0.00 Yes
3 Name 2009-03-17 *** 0.00 Yes
... ... ... ... ...
I was trying to use a conditional if statement or df.loc but I don't know my way around Python enough to put it together. Any advice would be appreciated.