0

I input a DBF file into a dataframe and run query.

this are the codes.

from dbf import Table
import pandasql as ps

dfPath1 = Table('filename.dbf')
dfPath1.open()

df1 = pd.DataFrame(dfPath1, columns=['column1', 'column2', 'column3', 'column4'])

hour1 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_12am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '00:00:00' And open_time < '00:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly1 = hour1.fillna(0)
   print(hourly1)

hour2 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_1am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '01:00:00' And open_time < '01:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly2 = hour2.fillna(0)
   print(hourly2)

hour3 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_2am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '02:00:00' And open_time < '02:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly3 = hour3.fillna(0)
   print(hourly3)

hour4 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_3am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '03:00:00' And open_time < '03:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly4 = hour4.fillna(0)
   print(hourly4)

hour5 = ps.sqldf("Select df1.date, df1.session_no AS 'session_number', SUM(df1.received) AS 'sales_for_4am', SUM(df1.taxes)+SUM(df1.auto_grat)+SUM(df1.discount) AS 'gross_vat_sales', SUM(df1.taxes) AS 'total_vat', SUM(df1.discount) AS 'discount', SUM(df1.auto_grat) AS 'service_charge' From df1 Where open_time >= '04:00:00' And open_time < '04:59:59' And date= '" + str + "'")
with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', None):
   hourly5 = hour5.fillna(0)
   print(hourly5)

data = [name,hour1.iloc[:,0],hour1.iloc[:,1],hour1.iloc[:,2],hour1.iloc[:,3],hour1.iloc[:,4],hour1.iloc[:,5],hour1.iloc[:,6]]
data2 = [name,hour2.iloc[:,0],hour2.iloc[:,1],hour2.iloc[:,2],hour2.iloc[:,3],hour2.iloc[:,4],hour2.iloc[:,5],hour2.iloc[:,6]]
data3 = [name,hour3.iloc[:,0],hour3.iloc[:,1],hour3.iloc[:,2],hour3.iloc[:,3],hour3.iloc[:,4],hour3.iloc[:,5],hour3.iloc[:,6]]
data4 = [name,hour4.iloc[:,0],hour4.iloc[:,1],hour4.iloc[:,2],hour4.iloc[:,3],hour4.iloc[:,4],hour4.iloc[:,5],hour4.iloc[:,6]]
data5 = [name,hour5.iloc[:,0],hour5.iloc[:,1],hour5.iloc[:,2],hour5.iloc[:,3],hour5.iloc[:,4],hour5.iloc[:,5],hour5.iloc[:,6]]

hour1['name'] = name
hour1.to_csv('sample_output.txt', index=False, sep=' ')

and then get an error like this.. KeyError: None of [Int64Index([0], dtype='int64')] are in the [columns]

this is the output of the text file that i want.. "2020-01-01 943 527.0 56.46 56.46 0.0 0.0"

  • You can look into `simpledbf` package: https://stackoverflow.com/questions/41898561/pandas-transform-a-dbf-table-into-a-dataframe – Tim Nov 12 '21 at 03:06
  • I already look that and try all of that but im having a hard time to input the dbf file into a dataframe. so far SQLDF is the one that can import dbf file into a dataframe. – Arvin Tuzki Nov 12 '21 at 03:16
  • Why is this tagged R? – camille Nov 16 '21 at 02:30

1 Answers1

0

The problem most likely lies in the following line

data = [name,hour1[[0]],hour1[[1]],hour1[[2]],hour1[[3]],hour1[[4]],hour1[[5]],hour1[[6]]]

You can access the columns using iloc

data = [name,hour1.iloc[:,0],hour1.iloc[:,1],hour1.iloc[:,2],hour1.iloc[:,3],hour1.iloc[:,4],hour1.iloc[:,5],hour1.iloc[:,6]]

Though you could've used pandas.DataFrame.to_csv to write to csv more easily. For example,

# add a name column
hour1['name'] = name

# write to csv
hour1.to_csv('sample_output.txt', index=False, sep=' ')

By using sep=' ' , the output wil be separated by space on each row like OP described.


Alternatives:

dbfread package:
https://gist.github.com/jamespaultg/990e4650a384ade5c57a2eb56515ba62 https://dbfread.readthedocs.io/en/latest/exporting_data.html#pandas-data-frames

The DataFrame equivalent operation for that sql query is:

import pandas as pd

df1['open_time_hr'] = pd.to_datetime(df1['open_time']).dt.hour
df1['gross_nat_value'] = df1.taxes+df1.auto_grat+df1.discount
df_agg = df1.groupby(['date', 'session_no', 'open_time_hr']).sum()
df_agg.to_csv('sample_output.txt', index=False, sep=' ')
Tim
  • 3,178
  • 1
  • 13
  • 26
  • it generates a text file but the value in text file is like this.........Sample Store,"0 None Name: date, dtype: object","0 None Name: session_number, dtype: object","0 None Name: sales_for_12am, dtype: object","0 None Name: gross_vat_sales, dtype: object","0 None Name: total_vat, dtype: object","0 None Name: discount, dtype: object","0 None Name: service_charge, dtype: object" – Arvin Tuzki Nov 12 '21 at 03:58
  • the output value of the text file should be like this... "2020-01-01 943 527.0 56.46 56.46 0.0 0.0" – Arvin Tuzki Nov 12 '21 at 04:00
  • That could be an issue with the sql. When you print `hour1y1`, what does it show? – Tim Nov 12 '21 at 04:09
  • header = date session_number sales_for_12am gross_vat_sales total_vat discount service_charge, value = 0 0 0 0 0 0 0 – Arvin Tuzki Nov 12 '21 at 04:30
  • is your text output generated using your method? it looks like that method maybe just writing the string representation of the dataframe columns. If so, you should try the `to_csv` method in this answer. – Tim Nov 12 '21 at 04:39
  • is it possible to generate the output in text file and not in csv? – Arvin Tuzki Nov 12 '21 at 05:31
  • Do you have an example desired output? From the question, it seems like you are only naming the file as txt but it’s really a csv, ie a plain text file with comma separated values. – Tim Nov 12 '21 at 13:54
  • okay. so may i know what codes do I need to generate a text file? – Arvin Tuzki Nov 14 '21 at 09:03
  • See updated answer – Tim Nov 15 '21 at 00:43
  • may i know how to put the other output in next line of sample_output.txt? because i have a hour1 up to hour24. – Arvin Tuzki Nov 16 '21 at 01:16
  • https://stackoverflow.com/a/17975690/10957844 – Tim Nov 16 '21 at 01:20
  • the generated text file only prints the header. "date session_number sales_for_12am gross_vat_sales total_vat discount service_charge" – Arvin Tuzki Nov 16 '21 at 01:33
  • Can you update the question with your new code? And copy and paste whatever is printed when you do `print(hour1)`. It’s a little confusing after the lengthy discussion in comment – Tim Nov 16 '21 at 01:39
  • see updated question. thanks – Arvin Tuzki Nov 16 '21 at 01:47
  • What are the print outputs? How about using https://dbfread.readthedocs.io/en/latest/exporting_data.html, convert dbf to csv first and then use pandas to get the sums from csv. – Tim Nov 16 '21 at 01:56
  • the print output contains the header and the value.. but in the generated text file the ouput is this "date session_number sales_for_12am gross_vat_sales total_vat discount service_charge" – Arvin Tuzki Nov 16 '21 at 02:01
  • how about the `dbfread` package? – Tim Nov 16 '21 at 02:10
  • i tried using dfread and it works reading the dbf into a dataframe. but when i make a query in dataframe by using sqldf. i have an error of AttributeError: module 'pandasql' has no attribute 'sqldf' – Arvin Tuzki Nov 16 '21 at 02:49
  • See updated answer. You don't need `pandasql` package. – Tim Nov 16 '21 at 02:59
  • i already fix the error about sqldf. i just didn't import pandasql. now i got an error on generating the text file. "'list' object has no attribute 'to_csv'" – Arvin Tuzki Nov 16 '21 at 03:21
  • `hour1` seems like a DataFrame not a list. – Tim Nov 16 '21 at 03:58
  • do i need to convert hour1? – Arvin Tuzki Nov 16 '21 at 05:15
  • Hmm..If `hour1.fillna` runs ok, `to_csv` should also work. Maybe try to move them closer. – Tim Nov 16 '21 at 05:32
  • how to print the hour2 to the generated text file in hour1? – Arvin Tuzki Nov 16 '21 at 06:11
  • 1
    i finish it btw thanks for the help. – Arvin Tuzki Nov 16 '21 at 08:32