1

TASK 1:

I have table like this:

+----------+------------+----------+------------+----------+------------+-------+
| a_name_0 | id_qname_0 | a_name_1 | id_qname_1 | a_name_2 | id_qname_2 | count |
+----------+------------+----------+------------+----------+------------+-------+
| country  | 1          | NAN      | NAN        | NAN      | NAN        | 100   |
+----------+------------+----------+------------+----------+------------+-------+
| region   | 2          | city     | 8          | NAN      | NAN        | 20    |
+----------+------------+----------+------------+----------+------------+-------+
| region   | 2          | city     | 9          | NAN      | NAN        | 80    |
+----------+------------+----------+------------+----------+------------+-------+
| region   | 3          | age      | 4          | sex      | 6          | 40    |
+----------+------------+----------+------------+----------+------------+-------+
| region   | 3          | age      | 5          | sex      | 7          | 60    |
+----------+------------+----------+------------+----------+------------+-------+

I need to turn each row in series, drop NANs and convert series in a dictionaries which will be variable in size, for example, first 2 dicts will look like this:

{'a_name_0':'country','id_qname_0':1}
{'a_name_0':'region','id_qname_0':2, 'a_name_1':'city','id_qname_1':8}
{'a_name_0':'region','id_qname_0':2, 'a_name_1':'city','id_qname_1':9}

Each dictionary after that should be stored in a list.

TASK 2.

Using table below I have to count appearance of columns from dict from previous step:

+----------+------------+----------+------------+----------+
| id       | country    | city     | age        | sex      | 
+----------+------------+----------+------------+----------+
| 1        | 1          | NAN      | NAN        | NAN      | 
+----------+------------+----------+------------+----------+
| 2        | 1          | 8        | NAN        | NAN      | 
+----------+------------+----------+------------+----------+

If there is some faster mapping solution please advise since what I'm about to do is probably going to be quite messy. This answer doesn't help me since I need iterator for extracting parameters as well as counting their appearance.

Hrvoje
  • 13,566
  • 7
  • 90
  • 104

1 Answers1

3

You can remove count column and convert all rows to list of dicts by DataFrame.to_dict with orient='r' (records) and then filter out dicts with missing values in dictionary comprehension:

L = [{k:v for k, v in x.items() if pd.notna(v)} for x in df.drop('count', 1).to_dict('r')]
print (L)
[{'a_name_0': 'country', 'id_qname_0': 1},
 {'a_name_0': 'region', 'id_qname_0': 2, 'a_name_1': 'city', 'id_qname_1': 8.0}, 
 {'a_name_0': 'region', 'id_qname_0': 2, 'a_name_1': 'city', 'id_qname_1': 9.0}, 
 {'a_name_0': 'region', 'id_qname_0': 3, 'a_name_1': 'age', 
 'id_qname_1': 4.0, 'a_name_2': 'sex', 'id_qname_2': 6.0},
 {'a_name_0': 'region', 'id_qname_0': 3, 'a_name_1': 'age',
 'id_qname_1': 5.0, 'a_name_2': 'sex', 'id_qname_2': 7.0}]

Not 100% sure for second DataFrame:

L1 = [dict(zip(list(x.values())[::2], list(x.values())[1::2])) for x in L]
df = pd.DataFrame(L1)
print (df)
   country  region  city  age  sex
0      1.0     NaN   NaN  NaN  NaN
1      NaN     2.0   8.0  NaN  NaN
2      NaN     2.0   9.0  NaN  NaN
3      NaN     3.0   NaN  4.0  6.0
4      NaN     3.0   NaN  5.0  7.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    nice, better than my solution `for index, row in df.drop('count',axis=1).iterrows(): list_.append(row.dropna().to_json(orient='index'))` – Umar.H Dec 30 '19 at 10:23
  • 2
    @Datanovice list comprehension is way to go: https://stackoverflow.com/a/55557758/2119941 but I'll take any solution that works since this is not big data case. Thank you Jozef my friend! You did it again ;) – Hrvoje Dec 30 '19 at 12:37