I am looking for algorithm complexity better than substr_count($string, $needle, $offset, $length) in php

Question

I am trying to find an algorithm better than this function in PHP because this is making memory limit explode while running on 1000000000000 characters length string

substr_count($string, $needle, $offset, $length)

Any help will be appreciated.

If performance issue, first you can check if the needle is there or not using this https://stackoverflow.com/a/5821716/5026957 , and if the needle is found then go for the count the occurrence. Another alternative is you can use `explode()` to break the target string by needle, and apply `count() - 1` on the output array, — HarisH Sharma, Oct 07 '19 at 08:09
*"is making memory limit explode"*: How much memory do you actually have available? — trincot, Oct 07 '19 at 08:28
Please produce a minimum verifiable example. Also, where is the bottleneck? In file reading(as in taking string as input) or in the substr_count() processing? — nice_dev, Oct 07 '19 at 09:19
The problem @harishsharma is that I am counting the number of 'a' characters in that 1000000000000 string characters, so there is no way to divide it, and if I divided it, I will count in all chunks which will be the same — Mostafa A. Hamid, Oct 08 '19 at 01:26
@trincot I do not know exactly the memory limit, but I got that in an hackerrank problem solving question, and it is not available to be the memory limit for the machine hosting the code — Mostafa A. Hamid, Oct 08 '19 at 01:29
@vivek_23 honestly, I do not read from a file, the parameter of the 1000000000000 string characters is passed directly to the implementation function — Mostafa A. Hamid, Oct 08 '19 at 01:30
You mean that on the HackerRank server they pass your code a string that takes 1 TB of memory!? Even in 2019 that still sounds as extreme. Can you give the link? — trincot, Oct 08 '19 at 05:00
@MostafaA.Hamid If there are major performance issue when user is on the website, then you can run your PHP algorithm in the background, or you can also use Shell/Bash script, and run it via PHP, I also used Shell/Bash script, in long calculation or report processing, — HarisH Sharma, Oct 09 '19 at 05:09
@harishsharma No, it is just an algorithm question on hackerrank.com — Mostafa A. Hamid, Oct 16 '19 at 05:32

score 0 · Answer 1 · answered Oct 07 '19 at 08:20

0

If you have such a huge file, you will indeed run into problems.

I think you should make your own code using the filehandle, and read in in chunks.

Have a look at fopen() and fread():

Open a filehandle to your huge file with fopen, then read in chunks of data with fread. You must provide the logic yourself to handle the case where the string you are looking for is broken between the chuncks of data.

That way you will NOT have the whole file in memory, but just the parts you get in with fread().

fopen()

fread()

answered Oct 07 '19 at 08:20

Erwin Moller

2,375
14
22

I am sorry, but the string is passed to the implementation function as a parameter and I do not read if from a file. – Mostafa A. Hamid Oct 08 '19 at 01:40
So you are looking at a huge stream of data? 1,000,000,000,000 characters is 1Tb bytes? That is an absurd amount of data. I have external hard disks in use that cannot hold that. How is is send to your script? – Erwin Moller Oct 09 '19 at 09:33
It is sent by a test case – Mostafa A. Hamid Oct 16 '19 at 05:31
So it comes over the internet? Still I advice the same: use fopen(). fopen — Opens file or URL – Erwin Moller Oct 17 '19 at 08:42
The test case that uses the function is in hackerrank website and I do not have control over it to implement it in a different way, it just uses the function directly and passes the 1 TB of characters to it – Mostafa A. Hamid Oct 18 '19 at 09:29
again: fopen CAN read an URL. Follow the link I posted and read the page. So in short: fopen the url, read in chunks, and do your thing. – Erwin Moller Oct 21 '19 at 08:41
:) looks like you smoke too much mann, that is really funny Erwin – Mostafa A. Hamid Oct 22 '19 at 09:24

I am looking for algorithm complexity better than substr_count($string, $needle, $offset, $length) in php

1 Answers1