SQLDF extracting the values and saving it to a text file

Question

I input a DBF file into a dataframe and run query.

this are the codes.

from dbf import Table
import pandasql as ps

dfPath1 = Table('filename.dbf')
dfPath1.open()

df1 = pd.DataFrame(dfPath1, columns=['column1', 'column2', 'column3', 'column4'])

hour1 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_12am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '00:00:00' And open_time < '00:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly1 = hour1.fillna(0)
   print(hourly1)

hour2 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_1am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '01:00:00' And open_time < '01:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly2 = hour2.fillna(0)
   print(hourly2)

hour3 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_2am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '02:00:00' And open_time < '02:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly3 = hour3.fillna(0)
   print(hourly3)

hour4 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_3am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '03:00:00' And open_time < '03:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly4 = hour4.fillna(0)
   print(hourly4)

hour5 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_4am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '04:00:00' And open_time < '04:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly5 = hour5.fillna(0)
   print(hourly5)

data = [name,hour1.iloc[:,0],hour1.iloc[:,1],hour1.iloc[:,2],hour1.iloc[:,3],hour1.iloc[:,4],hour1.iloc[:,5],hour1.iloc[:,6]]
data2 = [name,hour2.iloc[:,0],hour2.iloc[:,1],hour2.iloc[:,2],hour2.iloc[:,3],hour2.iloc[:,4],hour2.iloc[:,5],hour2.iloc[:,6]]
data3 = [name,hour3.iloc[:,0],hour3.iloc[:,1],hour3.iloc[:,2],hour3.iloc[:,3],hour3.iloc[:,4],hour3.iloc[:,5],hour3.iloc[:,6]]
data4 = [name,hour4.iloc[:,0],hour4.iloc[:,1],hour4.iloc[:,2],hour4.iloc[:,3],hour4.iloc[:,4],hour4.iloc[:,5],hour4.iloc[:,6]]
data5 = [name,hour5.iloc[:,0],hour5.iloc[:,1],hour5.iloc[:,2],hour5.iloc[:,3],hour5.iloc[:,4],hour5.iloc[:,5],hour5.iloc[:,6]]

hour1['name'] = name
hour1.to_csv('sample_output.txt', index=False, sep=' ')

and then get an error like this.. KeyError: None of [Int64Index([0], dtype='int64')] are in the [columns]

this is the output of the text file that i want.. "2020-01-01 943 527.0 56.46 56.46 0.0 0.0"

You can look into `simpledbf` package: https://stackoverflow.com/questions/41898561/pandas-transform-a-dbf-table-into-a-dataframe — Tim, Nov 12 '21 at 03:06
I already look that and try all of that but im having a hard time to input the dbf file into a dataframe. so far SQLDF is the one that can import dbf file into a dataframe. — Arvin Tuzki, Nov 12 '21 at 03:16

Tim · Accepted Answer · 2021-11-16T02:59:19.193

0

The problem most likely lies in the following line

data = [name,hour1[[0]],hour1[[1]],hour1[[2]],hour1[[3]],hour1[[4]],hour1[[5]],hour1[[6]]]

You can access the columns using iloc

data = [name,hour1.iloc[:,0],hour1.iloc[:,1],hour1.iloc[:,2],hour1.iloc[:,3],hour1.iloc[:,4],hour1.iloc[:,5],hour1.iloc[:,6]]

Though you could've used pandas.DataFrame.to_csv to write to csv more easily. For example,

# add a name column
hour1['name'] = name

# write to csv
hour1.to_csv('sample_output.txt', index=False, sep=' ')

By using sep=' ' , the output wil be separated by space on each row like OP described.

Alternatives:

dbfread package:
https://gist.github.com/jamespaultg/990e4650a384ade5c57a2eb56515ba62 https://dbfread.readthedocs.io/en/latest/exporting_data.html#pandas-data-frames

The DataFrame equivalent operation for that sql query is:

import pandas as pd

df1['open_time_hr'] = pd.to_datetime(df1['open_time']).dt.hour
df1['gross_nat_value'] = df1.taxes+df1.auto_grat+df1.discount
df_agg = df1.groupby(['date', 'session_no', 'open_time_hr']).sum()
df_agg.to_csv('sample_output.txt', index=False, sep=' ')

edited Nov 16 '21 at 02:59

answered Nov 12 '21 at 03:23

Tim

3,178
1
13
26

it generates a text file but the value in text file is like this.........Sample Store,"0 None Name: date, dtype: object","0 None Name: session_number, dtype: object","0 None Name: sales_for_12am, dtype: object","0 None Name: gross_vat_sales, dtype: object","0 None Name: total_vat, dtype: object","0 None Name: discount, dtype: object","0 None Name: service_charge, dtype: object" – Arvin Tuzki Nov 12 '21 at 03:58
the output value of the text file should be like this... "2020-01-01 943 527.0 56.46 56.46 0.0 0.0" – Arvin Tuzki Nov 12 '21 at 04:00
That could be an issue with the sql. When you print `hour1y1`, what does it show? – Tim Nov 12 '21 at 04:09
header = date session_number sales_for_12am gross_vat_sales total_vat discount service_charge, value = 0 0 0 0 0 0 0 – Arvin Tuzki Nov 12 '21 at 04:30
is your text output generated using your method? it looks like that method maybe just writing the string representation of the dataframe columns. If so, you should try the `to_csv` method in this answer. – Tim Nov 12 '21 at 04:39
is it possible to generate the output in text file and not in csv? – Arvin Tuzki Nov 12 '21 at 05:31
Do you have an example desired output? From the question, it seems like you are only naming the file as txt but it’s really a csv, ie a plain text file with comma separated values. – Tim Nov 12 '21 at 13:54
okay. so may i know what codes do I need to generate a text file? – Arvin Tuzki Nov 14 '21 at 09:03
See updated answer – Tim Nov 15 '21 at 00:43
may i know how to put the other output in next line of sample_output.txt? because i have a hour1 up to hour24. – Arvin Tuzki Nov 16 '21 at 01:16
https://stackoverflow.com/a/17975690/10957844 – Tim Nov 16 '21 at 01:20
the generated text file only prints the header. "date session_number sales_for_12am gross_vat_sales total_vat discount service_charge" – Arvin Tuzki Nov 16 '21 at 01:33
Can you update the question with your new code? And copy and paste whatever is printed when you do `print(hour1)`. It’s a little confusing after the lengthy discussion in comment – Tim Nov 16 '21 at 01:39
see updated question. thanks – Arvin Tuzki Nov 16 '21 at 01:47
What are the print outputs? How about using https://dbfread.readthedocs.io/en/latest/exporting_data.html, convert dbf to csv first and then use pandas to get the sums from csv. – Tim Nov 16 '21 at 01:56
the print output contains the header and the value.. but in the generated text file the ouput is this "date session_number sales_for_12am gross_vat_sales total_vat discount service_charge" – Arvin Tuzki Nov 16 '21 at 02:01
how about the `dbfread` package? – Tim Nov 16 '21 at 02:10
i tried using dfread and it works reading the dbf into a dataframe. but when i make a query in dataframe by using sqldf. i have an error of AttributeError: module 'pandasql' has no attribute 'sqldf' – Arvin Tuzki Nov 16 '21 at 02:49
See updated answer. You don't need `pandasql` package. – Tim Nov 16 '21 at 02:59
i already fix the error about sqldf. i just didn't import pandasql. now i got an error on generating the text file. "'list' object has no attribute 'to_csv'" – Arvin Tuzki Nov 16 '21 at 03:21
`hour1` seems like a DataFrame not a list. – Tim Nov 16 '21 at 03:58
do i need to convert hour1? – Arvin Tuzki Nov 16 '21 at 05:15
Hmm..If `hour1.fillna` runs ok, `to_csv` should also work. Maybe try to move them closer. – Tim Nov 16 '21 at 05:32
how to print the hour2 to the generated text file in hour1? – Arvin Tuzki Nov 16 '21 at 06:11
1

i finish it btw thanks for the help. – Arvin Tuzki Nov 16 '21 at 08:32

SQLDF extracting the values and saving it to a text file

1 Answers1

Alternatives: