0

I have a huge file of aroung 7 GB in .txt format which I'm processing using this thread Reading very large files in PHP.

The sample of the .txt file:

cat
dog
dog
mouse
cat
bird
dog
cat
...

Now, I need to organize this file into something like:

[cat] -> 3
[dog] -> 3
[mouse] -> 1
[bird] -> 1

Please guide where to search to solve this matter. Thank you for your time and advices.

George Profenza
  • 50,687
  • 19
  • 144
  • 218
Fedor
  • 39
  • 1
  • 8
  • you did not explain your problem very well. What are you trying to do? – Giacomo M Aug 06 '19 at 07:43
  • usually it involves opening a file, then reading the contents inside, then for gathering the count, you can use arrays. but then you need to start coding though – Kevin Aug 06 '19 at 07:44
  • i'm trying to calculate the entry of unique lines in my file. E.g. cat is mentioned 3 times, dog -3, ... – Fedor Aug 06 '19 at 07:44
  • I'm opening file, reading it line bu line, now I need to find a solution where to put lines and how to calculate the unique entry of each. – Fedor Aug 06 '19 at 07:45

2 Answers2

1

You can try with array_count_values

So you have to convert your .txt into an array. The main probelem is I don't know if the file size would lead to an issue ...

To convert your file into an array you can use explode().

If you need this file each time your script is executed, maybe store the result you need into a database. Thus you can simply get the data you need without executing all the file.

Have fun :)

Lucas
  • 73
  • 1
  • 8
1

This is a very basic example. It reads the text file, one line at a time and counts the lines that are the same.

<?php

$fn = fopen("my_very_large_file.txt","r");

$wordCounter = [];

while(!feof($fn)) {
    $word = fgets($fn);
    if (isset($wordCounter[$word])) {
        $wordCounter[$word]++;
    }
    else {
        $wordCounter[$word] = 1;
    }
}

fclose($fn);

echo "<pre>";
print_r($wordCounter);
echo "</pre>";

This will be exceedingly slow. However, that's not really a problem if you only need to use the code once. If you need it more often you need to find a way to speed it up.

KIKO Software
  • 15,283
  • 3
  • 18
  • 33
  • Thank you, added some memory though )) ini_set('memory_limit', '1024M'); and goes smooth ) – Fedor Aug 06 '19 at 08:31