2

Pandas beginner here. I have a .CSV file that I have opened using Pandas. The format of the file is follows:-

PatientId    x    y    width    height    target
A12kxk       23   45   10       20        1
Aldkd2       92   22   12       30        1
Aldkd2       29   11   98       34        1
Alll34                 0

I want to get a dictionary with the PatientId as key and the value would be an 2D array containing the x, y, width, height of one row of one patient row-wise and the various rows stacked down like this:-

Dictionary["Aldkd2"] = 92 22 12 30 29 11 98 34

I want to discard those which have 0 in target. There are one or more rows in the table for a single patientId. How can I do this?

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
  • Possible duplicate of https://stackoverflow.com/questions/31789160/convert-select-columns-in-pandas-dataframe-to-numpy-array and https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas – cyril Oct 31 '18 at 14:35
  • @cyril I don't think it's dup as above – Sociopath Oct 31 '18 at 14:42

2 Answers2

2

I hope this will solve your problem,

dic= df.groupby('PatientId').apply(lambda x:x[['x','y','width','height']].values.tolist()).to_dict()

Output:

{'Aldkd2': [[92.0, 22.0, 12.0, 30.0], [29.0, 11.0, 98.0, 34.0]], 'Alll34': [[nan, 0.0, nan, nan]], 'A12kxk': [[23.0, 45.0, 10.0, 20.0]]}

Now you can as you wish,

print dic['Aldkd2']

Output:

[[92.0, 22.0, 12.0, 30.0], [29.0, 11.0, 98.0, 34.0]]
Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
1

Using Pandas, you can read in the data into a Pandas Dataframe like so:

import pandas as pd
df = pd.read_csv('data.csv')

At that point, the dataframes value parameter contains the table data. You can iterate through this data to extract and csontruct the dictionary that you're looking for. Something along the lines of:

patient_info_dict = {}
for row in df.values:
    # At this point, the first value in 'row' is your dictionary key.

    # Check if the patient id is already a key in the dictionary
    if row[0] not in patient_info_dict:
        # Initialize an empty array
        patient_info_dict[row[0]] = []

        # Append the remaining data except for the key and the last value
        patient_info_dict[row[0]].append(row[1:-1])

    # If the patient id is already a key in the dictionary:
    else:
        # Append the remaining data except for the key and the last value
        patient_info_dict[row[0]].append(row[1:-1])

If you print the dictionary with:

print(patient_info_dict)

You get the following output:

{'A12kxk': [array([23, 45, 10, 20], dtype=object)], 'Aldkd2': [array([92, 22, 12, 39], dtype=object), array([29, 11, 98, 34], dtype=object)]}

The other answer is definitely more pythonic, and likely more efficient. However, if you're new to Python/Pandas, this might be helpful in understanding what exactly is happening.

  • A big thanks. Although the other answer is what I am gonna use for now, but knowing how I can complete the task with less use of inbuilt functions will certainly be more useful in the long run. – user8850564 Oct 31 '18 at 16:27