0

My problem is that I have a dataframe like this:

##for demonstration
import pandas as pd

example = {
"ID": [1, 1, 2, 2, 2, 3],
"place":["Maryland","Maryland", "Washington", "Washington", "Washington", "Los Angeles"],
"type": ["condition", "symptom", "condition", "condition", "sky", "condition"],
"name":  ["depression", "cough", "fatigue", "depression", "blue", "fever" ]
}

#load into df:
example = pd.DataFrame(example)

print(example) 
}

enter image description here

And I want to sort it by unique ID so that it will be reorganized like that:

#for demonstration
import pandas as pd

result = {
"ID": [1,2,3],
"place":["Maryland","Washington", "Los Angeles"],
"condition": ["depression", "fatigue", "fever"],
"condition1":["no", "depression", "no"],
"symptom": ["cough", "no", "no"],
"sky": ["no", "blue", "no"]
}

#load into df:
result = pd.DataFrame(result)

print(result) 

enter image description here

I tried to sort it like:

example.nunique()   

df_names = dict()
for k, v in example.groupby('ID'):
    df_names[k] = v

However, this gives me back a dictionary and is not organized in a way it should.

Is there a way to do it with the loop like for all unique ID create a new column if there is condition, sky or others? If there are couple conditions that the next condition is becoming condition1. Could you please help me if you know the way to realize it?

Dharman
  • 30,962
  • 25
  • 85
  • 135
Shu
  • 67
  • 6

1 Answers1

1

This should give you the answers you need. It is a combination of cumsum() and pivot

import pandas as pd

df = pd.DataFrame({
"ID": [1, 1, 2, 2, 2, 3],
"place":["Maryland","Maryland", "Washington", "Washington", "Washington", "Los Angeles"],
"type": ["condition", "symptom", "condition", "condition", "sky", "condition"],
"name":  ["depression", "cough", "fatigue", "depression", "blue", "fever" ]
})
df['type'] = df['type'].astype(str) + '_' + df.groupby(['place', 'type']).cumcount().astype(str)
df = df.pivot(index=['ID', 'place'], columns = 'type', values = 'name').reset_index()
df = df.fillna('no')
df.columns = df.columns.str.replace('_0', '')
df = df[['ID', 'place', 'condition', 'condition_1', 'symptom', 'sky']]
df
ArchAngelPwn
  • 2,891
  • 1
  • 4
  • 17