0

I have data in the following CSV file, available here:

http://s000.tinyupload.com/index.php?file_id=87473936848618674050

Screenshot of the CSV:

enter image description here

I've written the following code to import the CSV file into Python as a Pandas Dataframe, and then the code after that creates a dictionary dict. The dictionary has to have name and region as the keys, and the Windows and Linux prices as the dictionary values.

#Import libraries and CSV file into dataframe, renaming columns, printing head

import pandas as pd

df = pd.read_csv('file.csv')

col_names = ['Name','Region','API', 'Memory','vCPU', 'Storage', 'Linux', 'Windows' ]

df.columns = col_names

#Creating Dict
dict = {}

for i in df.index:

    key = (df.at[i, 'Name'] , df.at[i, 'Region'])
    value = (df.at[i, 'vCPU'], df.at[i, 'Memory'], df.at[i, 'Storage'], df.at[i, 'Windows'] , df.at[i, 'Linux'])

    dictionary = {key:value}
    dict.update(dictionary)

I now would like to write a function that would allows me to search through the dictionary.

For example, the user would input "32" for vCPUs, the function would bring back the region, name and Linux and Windows prices for any processors that have 32 vCPUs.

Later, I want to implement this search function for vCPU, Memory and Storage. (the full CSV has 1700 rows). Would really appreciate someone helping me out.

Rohan Pillai
  • 917
  • 3
  • 17
  • 26
Yung Bas
  • 31
  • 8

2 Answers2

1

Why not just search the dataframe? Your query code could generalize the following.

for index, row in df.loc[df['vCPU'] == '32 vCPUs'].iterrows():
    print (row['Region'] + ', ' + row['Name'] + ', Linux price: '+ row['Linux'] + ', Windows price: '+ row['Windows'])

Output:

US West - NorCal, Cluster Compute Eight Extra Large, Linux price: unavailable, Windows price: unavailable
US East - Ohio, I2 Eight Extra Large, Linux price: $6.820000 hourly, Windows price: $7.782000 hourly
APAC - Singapore, I3 High I/O Eight Extra Large, Linux price: $2.992000 hourly, Windows price: $4.464000 hourly

Here's more code to answer your follow-up comments. Above, I was showing how you can find data in a dataframe. Here's a bit more code that I hope adequately demonstrates how to strip out labels like "GiB", convert to values, iterate over matching values, etc. You have several use cases, so my hope is that this code gives you a base to build on. For getting closest matches, see the answers to this question.

# strip out the "GiB" and convert to float values
df['Memory'] = df['Memory'].str.split(' ').str[0].astype(float)

# use whatever code you need to get input from user
cpu_request = '2 vCPUs'
mem_request = 3

matches = df.loc[(df['vCPU'] == cpu_request)]
if matches.empty == 'True':
    print ('No matches.')
else:
    for index, row in matches.loc[(matches['Memory'] >= mem_request)].iterrows():
        print(row['Name'] + ':')
        # you could add another loop here if your data can have multiple entries per name.
        print ('\t' + row['Region'] + ', ' + str(row['Memory']) + ' GiB, Linux price: '+ row['Linux'] + ', Windows price: '+ row['Windows'])
David Gaertner
  • 386
  • 2
  • 7
  • That works too. How would I change this code to take in user input for multiple variables? For example, if I want to enter in vCPU and Memory, and output those. Or, if there is a match for vCPU but not for memory, how would I output the processors with vCPU matches but with the next highest Memory level? – Yung Bas Jul 06 '18 at 00:07
  • Also, I liked the way yours was formatted. However, it lists out each of the results, regardless of the region. I'm looking to have the result show just the name once, followed by the price of that product in each of the regions. Assume that each processor "name" has the same attributes in 15 regions, but the prices in each region are different – Yung Bas Jul 06 '18 at 00:17
0

You are going to overwrite all of the data if you have multiple values for key.

In [4]: d = {}

In [5]: d.update({1:1})

In [6]: d.update({1:2})

In [7]: d
Out[7]: {1: 2}

You would have to create a dict with a key mapping to a list of values

for i in df.index:
    key = (df.at[i, 'Name'] , df.at[i, 'Region'])
    value = (df.at[i, 'vCPU'], df.at[i, 'Memory'], df.at[i, 'Storage'], df.at[i, 'Windows'] , df.at[i, 'Linux'])

    if key in dict:
        dict[key].append(value)
    else:
        dict[key] = [value]

But all of this is kind of superfluous. You should be using the DataFrame.

aydow
  • 3,673
  • 2
  • 23
  • 40
  • do I set d to equal something in this case? I'm calling just d, and its not pulling in all of the keys and values using your code, just a few of them – Yung Bas Jul 10 '18 at 16:17
  • and is there any way I can get logic within this code? for example, if there is no match for CPU, how would I get it to return the next highest CPU (in cores)? – Yung Bas Jul 10 '18 at 16:17