Highest Voted 'solr-cell' Questions

18

votes

6 answers

Indexing PDF with Solr

Can anyone point me to a tutorial. My main experience with Solr is indexing CSV files. But I cannot find any simple instructions/tutorial to tell me what I need to do to index pdfs. I have seen this:…

asked Jul 14 '11 at 13:57

Mark

2,522
5
36
42

7

votes

1 answer

tika solr integration

I am trying to index using curl based request the request is curl "http://localhost:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf" On submitting…

solr full-text-search apache-tika solr-cell

asked May 31 '11 at 11:28

naveen gupta

71
1
4

7

votes

3 answers

How do I index documents in SOLR?

Im running Solr 1.4 on Ubuntu 10.04 (installed via apt-get solr-tomcat) and it seems to be working fine. Im having some difficulty finding any coherent info on how to index documents though. Im new to SOLR so bear with me! I have a folder…

solr full-text-search apache-tika solr-cell

asked May 10 '10 at 10:48

Shane

71
1
1
2

5

votes

2 answers

How can I use the latest version of the Sunspot gem with Solr Cell?

I've been trying (in vain) to get the latest version of the Sunspot gem (currently 2.0.0.pre.111215, incorporating Solr 3.5) working with Solr Cell. Currently I am using the older version of Sunspot in combination with Solr Cell provided by the…

ruby-on-rails solr sunspot solr-cell

asked Jan 20 '12 at 14:06

Simmo

1,717
19
37

5

votes

1 answer

Is there a best practice schema.xml for SOLR when importing rich documents?

I'm working with SOLR on a project where we import a bunch (~40k items) of rich documents, mainly MS Word, Powerpoint, Excel and PDFs. Is there a best practice schema.xml and/or solrconfig.xml to use in SOLR when using the ExtractingRequestHandler?…

solr lucene full-text-search apache-tika solr-cell

asked Dec 05 '11 at 23:31

Pål Brattberg

4,568
29
40

5

votes

1 answer

Indexing PDF with page numbers with Solr

I'm indexing PDFs with Solr using the ExtractingRequestHandler. I would like to display the page number along with hits in a document, e.g. "term foo was found in bar.pdf on pages 2, 3 and 5." Is it possible to include page numbers in the query…

pdf solr full-text-search apache-tika solr-cell

asked Nov 04 '10 at 06:05

Daniel Hepper

28,981
10
72
75

5

votes

2 answers

How to configure Apache Tika with apache Solr 1.4.1

I want to index a large number of pdf documents. I have found a reference showing that it could be done using Apache Tika but unfortunately I cannot find any reference that describes I could configure Apache Tika in Solr 1.4.1. Once configured I do…

solr solrnet apache-tika solr-cell

asked Oct 05 '10 at 13:09

Ahsan Iqbal

1,422
5
20
39

5

votes

1 answer

Solr ExtractingRequestHandler extracting "rect" in links

I am utilizing solr ExtractingRequestHandler to extract and index HTML content. My issue comes to the extracted links section that it produces. The extracted content returned has "rect" inserted where they do not exist in the HTML source. I have…

solr apache-tika solr-cell

asked Mar 04 '14 at 17:21

jakelley

76
5

5

votes

5 answers

textual content without metadata from Tika via SolrCell

Using Solr 3.6 and the ExtractionRequestHandler (aka Tika), is it possible to map just the textual content (of a PDF) to a field minus the metadata? The "content" field produced by Tika unfortunately contains all the metadata munged in with the text…

solr apache-tika solr-cell

asked Jun 04 '12 at 21:43

Peaeater

626
5
19

4

votes

2 answers

ExtractingRequestHandler - how do you post multi-valued literal fields?

I'm trying to post a literal, multi-valued field along with a PDF extract. Only one of the field values seems to be being added to the index. Does this need to be passed in a different way? Currently sending equivalent of (via POST…

solr apache-tika solr-cell

asked Dec 15 '11 at 17:07

paulusm

786
6
19

4

votes

1 answer

Getting the ExtractingRequestHandler to work in Solr

I am attempting to get Solr to work with Tika so I can index Word and PDF documents in my Drupal web site. I've looked at the Wiki page and this page and they indicate adding a requestHandler in solrconfig.xml. I did that and now Solr throws an…

drupal solr apache-tika solr-cell

asked Oct 27 '11 at 15:56

John81

3,726
6
38
58

4

votes

1 answer

How to boost a SOLR document when indexing with /solr/update

To index my website, I have a Ruby script that in turn generates a shell script that uploads every file in my document root to Solr. The shell script has many lines that look like this: curl -s \ …

solr apache-tika solr-cell

asked Feb 09 '11 at 02:24

Dan Tenenbaum

1,809
3
23
35

4

votes

2 answers

How do I index rich-format documents contained as database BLOBs with Solr 4.0+?

I've found a few related solutions to this problem. The related solutions will not work for me as I'll explain. (I'm using Solr 4.0 and indexing data stored in an Oracle 11g database.) Jonck van der Kogel's related solution (from 2009) is explained…

database solr blob apache-tika solr-cell

asked Feb 28 '13 at 23:34

DarkerIvy

1,477
14
26

3

votes

1 answer

Adding fields to pdf files using solrj

I am a newbee to solr.I am having a problem with adding fields/metadata to pdf files while indexing them in solr using the ContentStreamUpdateRequest.As the literal parameter must be used to add fields I tried the following: public static void…

solr solrj solr-cell

asked Mar 02 '12 at 13:14

user776193

115
1
7

3

votes

1 answer

Solr's TikaEntityProcessor not working

I'm trying to get Solr to index a database in which one column is a filename of a PDF document I'd like to index. My configuration looks like this:

solr apache-tika solr-cell

asked Jun 01 '10 at 21:22

Brad G.

801
5
12

1

2 3 4 5 Next

Questions tagged [solr-cell]