1

So I have a LOT (millions) of records which I am trying to process. I've tried MongoDB and Neo4j and both simply grind my dual core ubuntu box to a halt.

I am wondering (and I don't believe there is) if there is any way to store PHP arrays in a file but only load one array into memory. So for example:

<?php
$loaded = array('hello','world');
$ignore_me = array('please','ignore');
$ignore_me2 = array('please','ignore','again');
?>

So effectively I could call the $loaded array but the others aren't loaded into memory (even though they're in the same file)? I know about fread/fopen but that tends to be where the file is a general block of text.

If (as I suspect) the answer is no - how would something like a NoSQL database not need to a) create a file per record and b) load everything into memory?? I know Neo4j uses Java but PHP should be able to match that!!

Community
  • 1
  • 1
UKUser35
  • 163
  • 1
  • 11
  • http://stackoverflow.com/questions/10151070/lightweight-database-sql-or-nosql I would use MySQL – capcj May 18 '17 at 15:08
  • Try [Serializing the array](https://stackoverflow.com/questions/8641889/how-to-use-php-serialize-and-unserialize)? – Martin Jul 30 '17 at 08:31

2 Answers2

1

Did you consider Relational Databases such as Mysql, PostgreSql, MS Sql server?

I see that you tried MongoDB, an object-oriented database, and Neo4J, a node-oriented database.

I know that NoSQL is a great trend, but I tried NoSQL with my collections of millions of records and it performs so bad that I switched back to Relational SQL.

If you still insist to go with NoSQL, try Redis and Memcached, they are in-memory databases.

brian3t
  • 26
  • 4
  • HI there - I did. I was using a mysql database and hit about 20m rows and was finding that my insert queries were taking 40/50 seconds to insert about as many rows which when I'm trying to process 100m+ records was just too long. I also tried using the filesystem but hit an issue with inodes (otherwise that would have been my preference) – UKUser35 May 18 '17 at 17:00
  • I see. Have you considered these: 1) Do you actively need all 100M records, or is there a period after which you can archive them? Can you split them to different tables? 2) Enable the "Slow Query Log" and study what took 40/50 seconds – brian3t May 19 '17 at 14:26
  • 3) Optimize queries, tables and indexes 4) Make sure you use InnoDB 5) Use an SSD hard disk Keep the inserted table simple and link it to another table. For example, if you have table `employee` with 100 million records, you can keep only id, name, email, phone on that table. Create another table called employee_extra_attr and store address1, address2, note, language, etc... there – brian3t May 19 '17 at 14:33
0

You could use PHP Streams to read/write to a file.

Read file and convert to array

$content = file_get_contents('/tmp/file.csv'); //file.csv contains a,b,c
$csv = implode(","$content); //csv is now array('a', 'b', 'c');

Write to file

$line = array("a", "b", "c");
// create stream
$file = fopen("/tmp/file.csv","w");
// add line as csv
fputcsv($file, $line);
// close stream
fclose($file);

You can also loop and add lines to a csv and loop and retrieve the lines. https://secure.php.net/manual/en/function.fputcsv.php

You can retrieve multiple lines with fgetcsv which keeps a pointer of the next line to access as well https://secure.php.net/manual/en/function.fgetcsv.php

JacobW
  • 826
  • 5
  • 15
  • I would add a serious qualifier that I have found [numerous](https://stackoverflow.com/questions/13266146/excel-csv-export-into-a-php-file-with-fgetcsv) [issues](https://stackoverflow.com/questions/8882090/utf-8-problems-while-reading-csv-file-with-fgetcsv) with the PHP `fegtcsv`/`fputcsv` functions; `fgetcsv` is extremely picky about *exactly* what sort of csv it is, the splitter/field characters chosen and the character sets (BOM/no BOM) used in the file. – Martin Jul 30 '17 at 08:35