-1

I am rather new to coding, and tutorial hell has started to show it's toll. I need help to graph data that are both strings. I have attempted transforming the data using matplotlib, and pandas. However, I seem to not be able to graph them as the ones I have used require int type data.

I have managed to group the data using df.groupby(['type', 'url']).sum()

My current goal is to get the sum (how many are in each type) of each group and graph them. Dataset link below Kaggle - Malicious Links

Edit: Had an Image here. Made it into a code block instead:

df = pd.read_csv('/content/malicious_phish.csv')
df
<output: csv contents>
df.shape
<output: 651191, 2>
df.groupby(['type', 'url']).sum()
<output: corrupted text in a table>

Not sure if this is any better

I have tried using len() and .sum() or .count(). I have started to read into the matplotlib and pandas library on functions and tools for me to use, and hopefully use to resolve this problem.

  • Try `len(df.groupby(['type', 'url']).groups.keys())` if the number of distinct combinations is what you want – Ricardo Nov 29 '22 at 00:36
  • `df.groupby(['type', 'url']).size()` for count of each group – Ricardo Nov 29 '22 at 00:46
  • @Ricardo, `df.groupby(['type', 'url']).size()` The output shows the URL's and it's associated malicious types like phishing, or defacement as all 1s and the text becomes more corrupted than before, but that may be on my poor understanding on it. output is for one of them: ".;dæ9 phishing 1" Edit: it is on my end. Going to figure out how to resolve it. – ThereIsAGhostInMyComputer Nov 29 '22 at 01:07
  • Isn't that what you want? I don't really understand what output you desired for – Ricardo Nov 29 '22 at 01:11
  • Is `from collections import Counter` and `Counter(df['type'])` what you're looking for? – Ricardo Nov 29 '22 at 01:14
  • The data has many phishing, defacement and benign malicious links. But the output labels each malicious type as something like this: Link Phishing 1 Link Benign 1 Link Phishing 1 Not: Link Phishing 2 Link Benign 1 – ThereIsAGhostInMyComputer Nov 29 '22 at 01:14
  • YES! Thank you so much! This problem has been plaguing me for 2 days, and has been pestering me for so long. Thank you once again. – ThereIsAGhostInMyComputer Nov 29 '22 at 01:16
  • @СергейКох Ah, thank you. I will be sure to avoid doing that in the future when I have an issue at hand. – ThereIsAGhostInMyComputer Nov 30 '22 at 00:33

1 Answers1

0
from collections import Counter
Counter(df['Wafer'])

To plot the dict result, the follwing link is helpful https://stackoverflow.com/a/52572237/16353662.

Ricardo
  • 691
  • 3
  • 11