15

I have been running nutch crawling commands for the passed 3 weeks and now I get the below error when I try to run any nutch command:

Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory file: /tmp/hsperfdata_user/27050 Try using the -Djava.io.tmpdir= option to select an alternate temp location.

Error: Could not find or load main class ___.tmp.hsperfdata_user.27055

How do I solve this issue?

peterh
  • 11,875
  • 18
  • 85
  • 108
peter
  • 3,411
  • 5
  • 24
  • 27

2 Answers2

12

Yeah this is really an issue with the space available on the volume your /tmp is mounted on. If you are running this on EC2, or any cloud platform, attach a new volume and mount your /tmp on that. If running locally, no other option besides cleaning up to make more room.

Try commands like: df -h to see the % used and available space on each volume mounted on your instance. You will see something like:

Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            7.9G  7.9G     0 100% /
tmpfs                  30G     0   30G   0% /dev/shm
/dev/xvda3             35G  1.9G   31G   6% /var
/dev/xvda4             50G   44G  3.8G  92% /opt
/dev/xvdb             827G  116G  669G  15% /data/1
/dev/xvdc             827G  152G  634G  20% /data/2
/dev/xvdd             827G  149G  637G  19% /data/3
/dev/xvde             827G  150G  636G  20% /data/4
cm_processes           30G   22M   30G   1% /var/run/cloudera-scm-agent/process

You will begin to see this error when the disk space is full as shown in this dump.

Kingz
  • 5,086
  • 3
  • 36
  • 25
9

I think that the temporary location that was used has got full. Try using some other location. Also, check the #inodes free in each partition and clear up some space.

EDIT: There is no need to change the /tmp at OS level. We want nutch and hadoop to use some other location for storing temp files. Look at this to do that : What should be hadoop.tmp.dir ?

Community
  • 1
  • 1
Tejas Patil
  • 6,149
  • 1
  • 23
  • 38
  • how to change the temporary location? also i dnt know how to check the number of inodes free and clear the space. – peter Jan 12 '13 at 06:08
  • 2
    Dont worry man. Just google it out to get the command. If you are not geeky, the safest option will be to clear off all files from /tmp which belong to the user running the nutch process and created long back..like before 24 hours. – Tejas Patil Jan 12 '13 at 06:14
  • there is almost 3gb of data inside /tmp/hadoop-user/mapred/local/taskTracker/user/ folder can i safely delete the content of this folder? it wont affect the nutch crawling right? i am using nutch 2.1 with mysql. Can i also delete the files inside the folder /tmp/hadoop-user/mapred/staging/ ? – peter Jan 12 '13 at 07:00
  • if no nutch and hadoop process is running, then u can go ahead and delete those things. – Tejas Patil Jan 12 '13 at 07:06
  • nothing is crawling right now because of the space error so i guess i can delete the folder content? i want to make sure because we had a lot of trouble installing nutch in the first palce – peter Jan 12 '13 at 07:56
  • I deleted the content of /tmp/hadoop-user/mapred/local/taskTracker/user/ but when i run the nutch command again it shows the disk size as 100% ie 7.5G of the 7.9G is used. so can you help me with the command to change the tmp directory?? tried searching in google but i cant find the hadoop-site.xml file to change the location. I have almost 140G free on the other partition so i want to move the tmp location over there. – peter Jan 12 '13 at 09:57
  • It is not easy to move `/tmp` on a running system. I suggest you increase you swap space, delete as much as possible in `/tmp` and set it to mount a `tmpfs` on a reboot. This will use swap space for temporary storage which is faster and it can use space across multiple filesystems. – Peter Lawrey Jan 12 '13 at 11:57
  • @TejasP Thanks alot i managed to change the directory by insert the following in the nutch-site.xml as there was no hadoop-site.xml or mapred-site.xml file hadoop.tmp.dir /mnt/crawl/ – peter Jan 13 '13 at 18:18
  • Now i start getting Error: Could not find or load main class ___.tmp.hsperfdata_user.6777 This looks like a hadoop error. i had added mapred.system.dir property in nutch-site.xml and that had moved the mapred temp files but not the hadoop temp files using the hadoop.temp.dir property. i have tried setting hadoop.job.history.user.location property still i get the error. How do i move the hsperfdata_user folder to another location ? – peter Jan 14 '13 at 04:16
  • @peter : were u running in local mode or hadoop mode ? – Tejas Patil Jan 15 '13 at 10:44
  • @TejasP i am running nutch in hadoop mode. – peter Jan 24 '13 at 05:51
  • Hello, I would like to ask if you solved the problem. I have totally same problem right now. When I type "df" to a console, I see that my /dev/mapper/ir-root 32060300 32060300 0 100% / is full... and I have also Error: Could not find or load main class ___.tmp.hsperfdata_... error. – Jan Bouchner Sep 12 '13 at 12:38
  • @JohnnyGreenwood do you mean that there is no more disk space left on the partition where the tmp files are generated by Nutch ? If so, please clear some space off – Tejas Patil Sep 13 '13 at 00:14