-1

I have a .csv file (600 lines) with some field: commit id, smell type and more.

I would count the occourrences of each type of smell for each commit id.

example of output i wouuld:

   commit dfbu3u4498fbbefi: [dense structure :1, cyclic dependency:4, unstable dependency: 67, feature concentration: 6, god component: 8]
  commit  bifueifyuwefbvwr: [dense structure :34, cyclic dependency:43, unstable dependency: 97, feature concentration: 43, god component: 10]

I tried with this but i think I need another loop (maybe?) Sorry, I never used Python before

import csv
import collections

smell = collections.Counter()


with open('Ref.csv') as file:
    reader = csv.reader(file, delimiter=';')

    for row in reader:

        smell[row[0]] += 1

print (smell.most_common(5))

OUTPUT:

[('9b0dd5dc979bd490ae34f6d790c466b47c84c920', 96), ('6431099fe7d5d90da678a78051f12894da82c68d', 96), ('44fdfa7ea93c15bb116a25e0675d98469deafaa6', 96), ('b2c40612a2c60685555f35af71f5801391a58b4b', 96), ('aa6cbb78cca17a9de339b2d060c00352e8beedde', 96)]

or if i change row index to 2 i got

[('Unstable Dependency', 315), ('Feature Concentration', 238), ('God Component', 84), ('Cyclic Dependency', 28), ('Dense Structure', 7)]

daisy
  • 3
  • 4
  • 1
    kindly use this as a guide : https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – sammywemmy Jan 29 '20 at 17:42
  • 1
    What is the issue, exactly? What part are you struggling with? – AMC Jan 29 '20 at 17:52
  • `df.groupby(['commit_id', 'smell']).count()`. – Marat Jan 29 '20 at 17:54
  • What have you tried so far? Can you share your code and explain in detail you comment? Please refer to the documentation that @sammywemmy mentioned. – lv10 Jan 29 '20 at 17:54
  • @Iv10 thank you, I edited the question – daisy Jan 29 '20 at 18:01
  • @Marat thank you, i don't know why but i have some issue with groupby function – daisy Jan 29 '20 at 18:03
  • @daisy you need to use it on a pandas dataframe. `df = pd.read_csv('Ref.csv')` to create one – Marat Jan 29 '20 at 18:05
  • @Marat yes, i alredy did this. i installed panda 0.22 and it works, but when i try to groupby i got errors on module. – daisy Jan 29 '20 at 18:11
  • Please don't share information as images unless absolutely necessary, which isn't the case here. See: https://meta.stackoverflow.com/q/303812/11301900. – AMC Jan 29 '20 at 19:13

1 Answers1

0

You can use pandas to do it:

import pandas as pd

# Dataframe definition
df = pd.read_csv('Ref.csv', sep=';')

# Group and get the count values.

df_grouped = df.groupby(by=['commit', 'smell']).size()

df_grouped is now a pandas.series, if you want it to be a dataframe again you should do this:

df_grouped = df_grouped.reset_index()
df_grouped = df_grouped.rename(columns={0: "counts"})

I highly recommend you to have a look at the documentation: https://pandas.pydata.org/pandas-docs/stable/index.html

sergiomahi
  • 964
  • 2
  • 8
  • 21
  • i tried but failed :( I got this `runfile('C:/Users/daisy/.spyder-py3/untitled1.py', wdir='C:/Users/daisy/.spyder-py3') Traceback (most recent call last): File "", line 1, in runfile('C:/Users/daisy/.spyder-py3/untitled1.py', wdir='C:/Users/daisy/.spyder-py3') File "C:\Users\daisy\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace) ` – daisy Jan 29 '20 at 18:40
  • `File "C:\Users\daisy\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/daisy/.spyder-py3/untitled1.py", line 8, in df_grouped = df.groupby(by=['commit', 'smell']).size() File "C:\Users\daisy\Anaconda3\lib\site-packages\pandas\core\generic.py", line 7894, in groupby **kwargs` – daisy Jan 29 '20 at 18:43
  • `File "C:\Users\daisy\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2522, in groupby return klass(obj, by, **kwds) File "C:\Users\daisy\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 391, in __init__ mutated=self.mutated, File "C:\Users\daisy\Anaconda3\lib\site-packages\pandas\core\groupby\grouper.py", line 621, in _get_grouper raise KeyError(gpr) KeyError: 'commit' ` – daisy Jan 29 '20 at 18:44
  • Sorry I was using capital letters for the column names. I've edited my answer – sergiomahi Jan 29 '20 at 18:51
  • Maybe your columns have a different name, can you show me the output of `df.head()` – sergiomahi Jan 29 '20 at 19:01
  • this is the output of df.head() ` commit;package;smell 0 9b0dd5dc979bd490ae34f6d790c466b47c84c920; – daisy Jan 29 '20 at 19:07
  • Try adding ` sep=';' ` when reading the csv file. If this doesnt work, is there a way you can share the file with me so I can test it by myself? – sergiomahi Jan 29 '20 at 19:23