ValueError: invalid literal for long() with base 10: '5B'

Question

What I understood of this error is that it means that there is a column that is of type long(). But this column contains a value named '5B' which isn't a long type.

This is the line where the error occurs:

df_Company = df1.groupby(by=['manufacturer','quality_issue'], as_index=False) ['quality_issue2'].count()

I have checked all the column types of the dataframe df1. But there are no columns with the type long. 5B is a name of a manufacturer so I assume that the column manufacturer has suddenly became of type long during this sentence.

checked what types the dataframe df1 has.

print (df1.dtypes)
manufacturer                    object
yearweek                         int64
quality_issue                   object
quality_issue2                  object

I 'think' I have to do something with df_Company.astype(long) but it seems I can't make it work. Does anyone has an idea how to fix this?

Note: the strange thing is that on my other computer where I have Python 3.5.1 the same code works just fine. but when I run the code on my current computer where I have Python 2.7.9 I get this long error.

score 4 · Accepted Answer · edited May 23 '17 at 11:44

Problem is different, see 8381, but in my pandas version 0.18.1 it works nice.

I think you can change False to True and then reset_index:

df_Company=df1.groupby(by=['manufacturer','quality_issue'], as_index=True)['quality_issue2']
              .count()
              .reset_index()

Differences between size and count (see differences with numeric values):

Sample with string values:

import pandas as pd
import numpy as np

df1=pd.DataFrame([['foo','foo','bar','bar','bar','oats'],
                  ['foo','foo','bar','bar','bar','oats'],
                  [None,'foo','bar',None,'bar','oats']]).T
df1.columns=['manufacturer','quality_issue','quality_issue2']
print (df1)
  manufacturer quality_issue quality_issue2
0          foo           foo           None
1          foo           foo            foo
2          bar           bar            bar
3          bar           bar           None
4          bar           bar            bar
5         oats          oats           oats

df_Company=df1.groupby(by=['manufacturer','quality_issue'], as_index=False)['quality_issue2']
              .count()
print (df_Company)

  manufacturer quality_issue  quality_issue2
0          bar           bar               2
1          foo           foo               1
2         oats          oats               1

df_Company1=df1.groupby(by=['manufacturer','quality_issue'])['quality_issue2']
               .size()
               .reset_index(name='quality_issue2')
print (df_Company1)

  manufacturer quality_issue  quality_issue2
0          bar           bar               3
1          foo           foo               2
2         oats          oats               1

I think you can omit [quality_issue2], output is same:

df_Company1=df1.groupby(by=['manufacturer','quality_issue'])
               .size()
               .reset_index(name='quality_issue2')
print (df_Company1)
  manufacturer quality_issue  quality_issue2
0          bar           bar               3
1          foo           foo               2
2         oats          oats               1

[differences](http://stackoverflow.com/a/33346694/2901002): [`size`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html#pandas.core.groupby.GroupBy.size) includes `NaN` values, [`count`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.count.html#pandas.core.groupby.GroupBy.count) does not — jezrael, May 23 '16 at 14:59
What I was trying to do was to group by manufacturers, and see which issues the manufacturers had. And then count how many issues did each manufacturer had of those quality_issues. Therefore I think it is better to do count in place of size (right?). Basically the quality_issue and quality_issue2 columns have the exact same data in them. — Morganis, May 23 '16 at 15:07
See edit, I try explain by sample. Good luck and thank you for accepting. — jezrael, May 23 '16 at 15:38

ValueError: invalid literal for long() with base 10: '5B'

1 Answers1