0

So I have a huge file containing hundreds of thousands lines. I want to know how many different sessions or ids it contains. I really thought it wouldn't be that hard to do, but I'm unable to find a way.

Sessions look like this:

"session":"1425654508277"

So there will be a few thousand lines with that session, then it will switch, not necessarily incrementing by one, at all, I don't know the pattern if there's one. So I just want to know how many sessions appear in the document, how many are different between each other (they SHOULD be consecutive but it's not a requirement just something I noticed).

Is there an easy way to do this? Only things I've found even remotely close are excel macros and scripts, which lead me to think I'm not asking the right questions. I also found this: Notepad++ incrementally replace but it does not help in my case.

Thanks in advance.

Community
  • 1
  • 1
monkey intern
  • 705
  • 3
  • 14
  • 34
  • What does this have to do with JSON? –  May 10 '16 at 07:56
  • 2
    Not just `cat data | uniq | wc`? –  May 10 '16 at 07:57
  • Possible duplicate of [Show count of occurrences when smart highlighting in Notepad++](http://stackoverflow.com/questions/27793861/show-count-of-occurrences-when-smart-highlighting-in-notepad) – AdrianHHH May 10 '16 at 08:14
  • Use the answers on suggested duplicate with a regular expression search. – AdrianHHH May 10 '16 at 08:15
  • Well, it's a json file, with a key value type of expression. So, that. Maybe people that use json know to do this? @torazaburo Honestly it didn't cross my mind to do it with unix commands, but I've never done it for a really big file, will it be able to do it without much problem? It's kind of big in my experience. – monkey intern May 10 '16 at 08:34
  • I already saw that post, @AdrianHHH , it does not help me at all, if I knew the regular expression to properly filter what I want I wouldn't be here – monkey intern May 10 '16 at 08:34
  • Your question does not say that you do not know how to do a search, it asks about counting. Please [edit] your question to explain the real problem, state clearly which parts you know how to do and on which parts you need help. Also you need to explain what exactly you are searching for, what parts are fixed and for the variable parts, what sorts of values they hold. – AdrianHHH May 10 '16 at 08:39
  • If your search string is `"session":"{{a number}}"` then the regular expression would be `"session":"\d+"`. – AdrianHHH May 10 '16 at 08:40
  • Alright, now how do I count for different instances/values of that session? Which is to say, not how many times a session appear, but how many times that number changes. – monkey intern May 10 '16 at 09:02
  • You won't be able to achieve this with notepad++, consider using a programing language. Do you have access to php ? it would be very easy with it. – Pedro Lobito May 10 '16 at 23:41

4 Answers4

1

Consider using jq. You can extract session with [.session], then apply unique, then length.

https://stedolan.github.io/jq/manual/

I am no jq expert, and have not tested this, but it seems that the program

unique_by(.message) | length

might give you what you want.

  • That seems great but I do not know to use the tool. They do have an online one, which I'm using, but do not know the syntax for what you told me to do. Any further help would be awesome, I'll try for a bit to see if I get it. – monkey intern May 10 '16 at 08:59
1

According to your profile, you know JavaScript, so you can use that:

  1. Load the file.
  2. Look for session. (If this is JSON, this could be as simple as myJson['session'].)
  3. Keyed on session value, add to a map, e.g. myCounts[sessionValue] = doesNotMatter.
  4. Count the number of keys in the map.

There are easier ways, like torazaburo's suggestion to use cat data | uniq | wc, but it doesn't sound like you want to learn Unix, so you may as well practice your JavaScript (I do this myself when learning programming languages: use it for everything).

DavidS
  • 5,022
  • 2
  • 28
  • 55
1

You won't be able to achieve this with notepad++, but you can use a linux command shell command, i.e.:

cat sessions.txt | uniq | wc
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
0

Adding to my own question, if you manage to get the strings you want separated by columns in Excel, Excel has an option to Filter which automatically gives you the different values to filter a column by.

This means, applied to my case, if I get the key-value ("session":"idSession", the 100000 values each in a row), all of it in one column, filter, count manually, I get the number of different values.

Didn't get to try the wc/unix option because I found this while trying to apply the other method

monkey intern
  • 705
  • 3
  • 14
  • 34