84

I used Counter on a list to compute this variable:

final = Counter(event_container)

print final gives:

Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})

Now I want to convert final into a Pandas DataFrame, but when I'm doing:

final_df = pd.DataFrame(final)

but I got an error.

I guess final is not a proper dictionary, so how can I convert final to a dictionary? Or is it an other way to convert final to a DataFrame?

EdChum
  • 376,765
  • 198
  • 813
  • 562
woshitom
  • 4,811
  • 8
  • 38
  • 62
  • What do you want the final df to look like? Do you want each entry to be a column or a row? – EdChum Jun 29 '15 at 08:38

5 Answers5

128

You can construct using from_dict and pass param orient='index', then call reset_index so you get a 2 column df:

In [40]:
from collections import Counter
d = Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})
df = pd.DataFrame.from_dict(d, orient='index').reset_index()
df

Out[40]:
                          index   0
0                         login   1
1   rt_transaction_confirmation   1
2                  fb_view_cart  22
3                    fb_connect   1
4               rt_view_product  23
5                     fb_search  29
6                          sale   1
7               fb_view_listing  76
8                   add_to_cart   2
9                  rt_view_cart  12
10                fb_homescreen  63
11              fb_view_product  37
12            rt_home_start_app  46
13             fb_view_wishlist  39
14              create_campaign   1
15                    rt_search  12
16                   guest_sale   1
17             remove_from_cart   1
18              rt_view_listing  50

You can rename the columns to something more meaningful:

In [43]:
df = df.rename(columns={'index':'event', 0:'count'})
df

Out[43]:
                          event  count
0                         login      1
1   rt_transaction_confirmation      1
2                  fb_view_cart     22
3                    fb_connect      1
4               rt_view_product     23
5                     fb_search     29
6                          sale      1
7               fb_view_listing     76
8                   add_to_cart      2
9                  rt_view_cart     12
10                fb_homescreen     63
11              fb_view_product     37
12            rt_home_start_app     46
13             fb_view_wishlist     39
14              create_campaign      1
15                    rt_search     12
16                   guest_sale      1
17             remove_from_cart      1
18              rt_view_listing     50
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 1
    Thank you! Why do you need to pass the param orient='index'? It certainly works, and thank you for your answer, but in trying to educate myself I don't understand why this param is necessary. – Heather Claxton Oct 14 '19 at 00:23
  • 1
    @HeatherClaxton if you didn't pass `orient='index'` then it will raise a `ValueError` because it then expects you to pass an index, you should experiment with the params to understand the difference in output – EdChum Oct 14 '19 at 08:30
15

Another option is to use DataFrame.from_records method

import pandas as pd
from collections import Counter

c = Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})

df = pd.DataFrame.from_records(list(dict(c).items()), columns=['page','count'])

It's a one-liner and speed seems to be the same.

Or use this variant to have them sorted by most used. Again the performance is about the same.

df = pd.DataFrame.from_records(c.most_common(), columns=['page','count'])
pvasek
  • 1,086
  • 11
  • 11
  • 2
    I nominate this as best answer because it you don't need to rename columns afterwards, and it avoids having a data column that is treated as an index rather than an indexed column (due to the "orient" argument in from_dict) – heretomurimudamura Oct 14 '20 at 04:26
  • +1 this solution has lower runtime than the accepted answer. I used `%%time` in `jupyter-notebook` to find the runtime. This one has runtime in `microsecond` range, whereas the accepted answer has two operations and the runtime is in `millisecond` range for `30` entries. – hafiz031 Dec 17 '21 at 07:18
6

If you want two columns, set the keyword argument orient='index' when creating a DataFrame from a dictionary using from_dict:

final_df = pd.DataFrame.from_dict(final, orient='index')

See the documentation on DataFrame.from_dict

galath
  • 5,717
  • 10
  • 29
  • 41
  • thanks, but this give me a 1 ligne, n column dataframe. How can I have a n lines, 2 columns dataframe? – woshitom Jun 29 '15 at 08:39
1

I found it more useful to transform the Counter to a pandas Series that is already ordered by count and where the ordered items are the index, so I used zip:

def counter_to_series(counter):
  if not counter:
    return pd.Series() 
  counter_as_tuples = counter.most_common(len(counter)) 

  items, counts = zip(*counter_as_tuples)
  return pd.Series(counts, index=items)

The most_common method of the counter object returns a list of (item, count) tuples. zip will throw an exception when the counter has no items, so an empty Counter must be checked beforehand.

Suzana
  • 4,251
  • 2
  • 28
  • 52
0

The error you got was probably "If using all scalar values, you must pass an index." To fix this, just provide an index (e.g., "count") and then transpose:

final_df = pd.DataFrame(final, index=['count']).transpose()

Done. You can rename the index afterwards if you wish.

David R
  • 994
  • 1
  • 11
  • 27