0

We have a csv file and written below code to do a group by and get the max value and create an output file. But while reading final output file using data-frame read_csv , it is showing as empty..

Input file: 

Manoj,2020-01-01 01:00:00
Manoj,2020-02-01 01:00:00
Manoj,2020-03-01 01:00:00
Rajesh,2020-01-01 01:00:00
Rajesh,2020-05-01 01:00:00
Suresh,2020-04-01 01:00:00

Final output file:

Manoj,2020-03-01 01:00:00
Rajesh,2020-05-01 01:00:00
Suresh,2020-04-01 01:00:00

and then when i am trying to read above final output file using df.read_Csv it shows dataframe empty.

import os
import re
import pandas as pd
z=open('outfile.csv','w')
fin=[]
k=open('j.csv','r')
for m in k:
    d=m.split(',')[0]
    if d not in fin:
        fin.append(d.strip())

for p in fin:
    gg=[]
    g=re.compile(r'{0}'.format(p))
    y=open('j.csv','r')
    for b in y:
        if re.search(g,b):
            gg.append(b)
    z.write(gg[-1].strip())
    z.write('\n')

df = pd.read_csv("outfile.csv", delimiter=',', names=['Col1','Col2'], header=0)

print(df)

final output: Empty DataFrame , Index: []

Is there anything i missed , please any one suggest...

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Ravi
  • 793
  • 3
  • 16
  • 29
  • Does this answer your question? [Import CSV file as a pandas DataFrame](https://stackoverflow.com/questions/14365542/import-csv-file-as-a-pandas-dataframe) – Trenton McKinney Sep 14 '20 at 17:48

2 Answers2

2
  • It's not necessary to use the for-loop to process the file. The data aggregation is more easily completed in pandas.
  • Your csv is shown without headers, so read the file in with pandas.read_csv, header=None, and use parse_dates to correctly format the datetime column.
    • The column with datetimes, is shown at column index 1, therefore parse_dates=[1]
    • This assumes the data begins on row 0 in the file and has no headers, as shown in the OP.
  • Create headers for the columns
  • As per a comment, the date component of 'datetime' can be accessed with the .dt accessor.
  • .groupby on name and aggregate .max()
import pandas as pd

# read the file j.csv
df = pd.read_csv('j.csv', header=None, parse_dates=[1])

# add headers
df.columns = ['name', 'datetime']

# select only the date component of datetime
df.datetime = df.datetime.dt.date

# display(df)
     name    datetime
0   Manoj  2020-01-01
1   Manoj  2020-02-01
2   Manoj  2020-03-01
3  Rajesh  2020-01-01
4  Rajesh  2020-05-01
5  Suresh  2020-04-01

# groupby
dfg = df.groupby('name')['datetime'].max().reset_index()

# display(dfg)
     name    datetime
0   Manoj  2020-03-01
1  Rajesh  2020-05-01
2  Suresh  2020-04-01

# save the file. If the headers aren't wanted, use `header=False`
dfg.to_csv('outfile.csv', index=False)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Thanks a lot.. This answers my question.. Can you pls suggest me if possible. How to send an HTML table in body of the email ? – Ravi Sep 14 '20 at 19:06
  • I tried and now how can i send that html file in body of an email ? – Ravi Sep 14 '20 at 19:24
  • @Ravi i'm not sure, that would be a different question. I tried cutting and pasting into a gmail email, but the html doesn't render. – Trenton McKinney Sep 14 '20 at 19:26
1

Create dataframe

import pandas as pd
df=pd.DataFrame(zip(
    ['Manoj','Manoj','Manoj','Rajesh','Rajesh','Suresh'],
    ['2020-01-01','2020-02-01','2020-03-01','2020-01-01','2020-05-01','2020-04-01'],
    ['01:00:00','01:00:00','01:00:00','01:00:00','01:00:00','01:00:00']),
    columns=['name','date','time'])

Convert date and time from string to date and time object

df['date']=pd.to_datetime(df['date'], infer_datetime_format=True).dt.date
df['time']=pd.to_datetime(df['time'],format='%H:%M:%S').dt.time

Take groupby

out=df.groupby(by=['name','time']).max().reset_index()

You can save and load it again

out.to_csv('out.csv',index=False)
df1=pd.read_csv('out.csv')

result

    name      time        date
0   Manoj  01:00:00  2020-03-01
1  Rajesh  01:00:00  2020-05-01
2  Suresh  01:00:00  2020-04-01

Sorry, I created two separate columns for date and time, but I hope you can understand it

Talha Anwar
  • 2,699
  • 4
  • 23
  • 62