0

This is what my init.php looks like that is loaded across the whole website:

$suid = 0;
session_set_cookie_params(60, '/', '.' . $_SERVER['HTTP_HOST'], true);
session_save_path(getcwd() . '/a/');
if (!isset($_SESSION['id'])) {
    session_start(['cookie_lifetime' => 60]);
    $_SESSION['id'] = session_id();
    $_SESSION['start'] = date('d_m_Y_H_i');
    $_SESSION['ip'] = $_SERVER['REMOTE_ADDR'];
} elseif (isset($_SESSION['uid'])) {
    $suid = $_SESSION['uid'];
}

I'm currently testing PHP sessions, so I just put 60sec as lifetime.

I was wondering why there were sessions created since no one knows the domain yet, so I added ip. I looked it up and found this out:

enter image description here

So it was the Google crawler bot. Since there are more search engines and bots out there, I don't want to save these crawls in my session files and fill up my webspace with that.

So my questions are:

1) Even when the test lifetime value (60 seconds) is over, the session file remains in the custom directory. I read this is because I set a custom directory. Is this true?

2) What would be an efficient way to delete all non-used/expired session files? Should I add $_SESSION['last_activity'] with a timestamp and let a cronjob look in my custom dir, get the session file data and calculate expired sessions to delete it?

3) Should I avoid saving those unneeded sessions by those bot crawlers just looking for the string "bot" inside $_SERVER['HTTP_HOST'] or is there a better way to identify "non-human visitors"/crawlers?

I also appreciate any improvements/suggestions to my code at the top. I just caused some Internal Server Error previously, because session_start() has been called to often as far I can tell from php-fpm-slow-logs.

AlexioVay
  • 4,338
  • 2
  • 31
  • 49

2 Answers2

3

1) Even when the test lifetime value (60 seconds) is over, the session file remains in the custom directory. I read this is because I set a custom directory. Is this true?

No, the custom directory is picked up by the session GC and the files will be cleaned up. It just doesn't happen immediately.

2) What would be an efficient way to delete all non-used/expired session files? Should I add $_SESSION['last_activity'] with a timestamp and let a cronjob look in my custom dir, get the session file data and calculate expired sessions to delete it?

PHP 7.1 has session_gc(), which you can call from a cronjob and it will do everything necessary.

On older PHP versions, you'd rely on the probability-based GC by default, where cleanups are performed randomly.
This may not be particularly efficient, but it has been the only universal solutions for over a decade, so ...

However, if your server runs Debian, it likely has session.gc_probability set to 0 and using a Debian-specific crontab script to do the cleanup at regular intervals - you will have problems with a custom directory in that case, and there are a few options:

  • Manually re-enable session.gc_probability.
  • Configure session.save_path directly in your php.ini, so the default cron script can pick it up.
  • Don't use a custom dir. Given that you currently have getcwd().'/a/', I'd say the default sessions dir on Debian is almost certainly a more secure location, so it would objectively be a better one.
  • Write your own cronjob to do that, but you have to really know what you're doing. $_SESSION['last_activity'] is not even usable for this; file access/modified times provided by the file-system itself are.

3) Should I avoid saving those unneeded sessions by those bot crawlers just looking for the string "bot" inside $_SERVER['HTTP_HOST'] or is there a better way to identify "non-human visitors"/crawlers?

You're thinking of $_SERVER['HTTP_USER_AGENT'], but no - this isn't a solution.

Little known (or largely ignored, for convenience), but the only way to do this correctly is to never start a session before login.

The annoyance of crawlers triggering useless session files is a neglible issue; the real concern is a determined attacker's ability to fill up your session storage, use-up all possible session IDs, avoid session.use_strict_mode - none of these attacks are easy to pull off, but can result in DoS or session fixation, so they shouldn't be easily dismissed as possibilities either.

P.S. Bonus tip: Don't use $_SERVER['HTTP_HOST'] - that's user input, from the HTTP Host header; it might be safe in this case due to how cookies work, but it should be avoided in general.

Narf
  • 14,600
  • 3
  • 37
  • 66
  • Wow. Thanks for that detailed answer! I didn't know that it doesn't happen immediately. I didn't track my custom dir yet, but it just seems to get never deleted. So you suggest using `filemtime()` to detect the last modified file time? About setting sessions without a logged in user and using a custom dir: I'm sending ajax requests through 2 different domains on the same server and used sessions to get the right language code I saved in a session variable. Also inside the register form (3 steps). Should that be replaced with regular cookies to avoid using sessions without a logged in user? – AlexioVay Apr 05 '17 at 10:21
  • Sorry for this second comment, just wanted to make it clear: The ajax and custom dir issue is about detecting the right session to refer to that domain. I couldn't find another way for this without setting the save dir. – AlexioVay Apr 05 '17 at 10:24
  • Yes, a simple cookie is more than fine for a simple language identifier. Sessions are for private data and authentication - you don't need that to pick a display language. :) – Narf Apr 05 '17 at 10:28
  • Forgot about `filemtime()` ... that depends on how you want it to behave. You probably want `fileatime()` or `filectime()` instead. – Narf Apr 05 '17 at 10:32
  • Great, you helped me a lot! To the other points you mentioned: Sadly I can't access/configure my server on this shared host. So I need to use this option. But I'm reading about Redis and tyring to implement that with Heroku. What do you think about that? Until then I will use session the way you suggested. Do you have another suggestion to the custom dir issue? Because, like I said, the ajax requests are forcing me to save it on one specific domain to detect the correct session in the ajax file. Or are there other options? – AlexioVay Apr 05 '17 at 10:42
  • Cache stores like Memcache or Redis will make the GC problem go away - yes, but other than that I've listed all the solutions. Though, I still think you don't need sessions in the first place. – Narf Apr 05 '17 at 11:02
1
  1. cleanup php session files

  2. This cronjob already exists (see 1.) - the most efficient way is to store the session data in memcached instead of plain files because of the memory usage and TTL.

  3. You should avoid comparing strings to user agents or hosts because it is unreliable, HTTP_HOST is your local host name, not a remote host name and the most significant reason why you shouldn't make anything different for the google bot: you fake the behaviour of the website which is very bad for your Google ranking. Welcome google like any other visitor of the website.

Community
  • 1
  • 1
Daniel W.
  • 31,164
  • 13
  • 93
  • 151
  • Thanks for your helpful informations. Unfortunately I'm on a shared host and can't enable memcache. Would there be another way without access to server configurations? – AlexioVay Apr 05 '17 at 09:28
  • @Vay no, not really. But on the other hand, the files are ok to be there. They are not a problem. If they should be deleted immediately, there would be constant I/O on the disc which is bad. – Daniel W. Apr 05 '17 at 10:04