1

I have look at Convert a Pandas DataFrame to a dictionary for guides to convert my dataframe to a dictionary. However, I can't seem to change my code to convert my output into a dictionary.

Below are my codes.

import pandas as pd
import collections


governmentcsv = pd.read_csv('government-procurement-via-gebiz.csv',parse_dates=True) #read csv and it contain dates (parse_dates = true)
extract = governmentcsv.loc[:, ['supplier_name','award_date']] #only getting these columns

extract.award_date= pd.to_datetime(extract.award_date)

def extract_supplier_not_na_2015():
    notNAFifteen = extract[(extract.supplier_name != 'na') & (extract.award_date.dt.year == 2015)] #extract only year 2015
    notNAFifteen.reset_index(drop = True,inplace=True) #reset index
    notNAFifteen.index += 1 #and index start from 1
    #SortednotNAFifteen = collections.orderedDictionary(notNAFifteen)

    return notNAFifteen

print extract_supplier_not_na_2015()

The OUTPUT is:

                                          supplier_name award_date
1                               RMA CONTRACTS PTE. LTD. 2015-01-28
2     TESCOM (SINGAPORE) SOFTWARE SYSTEMS TESTING PT... 2015-07-01
3                                  MKS GLOBAL PTE. LTD. 2015-04-24
4               CERTIS TECHNOLOGY (SINGAPORE) PTE. LTD. 2015-06-26
5                    RHT COMPLIANCE SOLUTIONS PTE. LTD. 2015-08-14
6                                   CLEANMAGE PTE. LTD. 2015-07-30
7                             SOLUTIONSATWORK PTE. LTD. 2015-11-23
8                                       Ufinity Pte Ltd 2015-05-04
9                                         NCS PTE. LTD. 2015-01-28
John Wick
  • 45
  • 6

1 Answers1

0

I think that I find this dataset: https://data.gov.sg/dataset/government-procurement

Anyway, here is code

import pandas as pd


df = pd.read_csv('government-procurement-via-gebiz.csv', 
                  encoding='unicode_escape', 
                  parse_dates=['award_date'], 
                  infer_datetime_format=True,
                  usecols=['supplier_name', 'award_date'], 
)

df = df[(df['supplier_name'] != 'Unknown') & (df['award_date'].dt.year == 2015)].reset_index(drop=True)

#Faster way:
d1 = df.set_index('supplier_name').to_dict()['award_date']

#Alernatively:
d2 = dict(zip(df['supplier_name'], df['award_date']))
Quant Christo
  • 1,275
  • 9
  • 23
  • thanks that works, however, as seen in my code, I need to extract both the supplier_name and the award_date == 2015 – John Wick Oct 19 '19 at 10:08
  • I've added additional step, similarly you can done for nans or other form of filtering. As far I understood main problem was conversion of df to dict. – Quant Christo Oct 19 '19 at 10:18
  • Proably using `notNAFifteen` instead of `extract` in snippet above will work with minimal change in your code. – Quant Christo Oct 19 '19 at 10:21
  • Sorry, uhm the I have edited my code and the output is {'na': 'award_date'} – John Wick Oct 19 '19 at 11:02
  • Hi, sorry I make a mistake, I found out that the supplier_name isn't unique. Meaning there are a few rows that have the same supplier_name, which I think the code will overwrite the previous data if there is another same exact supplier_name. This happens when I tried to extract the top 5 largest amount of value using your code. It is supposed to be "SANTARLI CONSTRUCTION PTE. LTD." as the top 1 value. However, there is 2 "SANTARLI CONSTRUCTION PTE. LTD." in the supplier_name column. Hence it got overwritten with the 2nd "SANTARLI CONSTRUCTION PTE. LTD." which has a lower value. – John Wick Oct 20 '19 at 07:53