5

I am having some confusion between the fieldtype available. string vs strings and int vs ints and the likes for other datatype.

What are the differences between the following 4?

<field name="string_multi" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="string_single" type="string" indexed="true" stored="true"/>
<field name="strings_multi" type="strings" multiValued="true" indexed="true" stored="true"/>
<field name="strings_single" type="strings" indexed="true" stored="true"/>

Given that I have document, what should I declare for my field named hashtags?

String multivalued or strings multivalue or strings without multivalue, ?

{
      "polarity":0.0,
      "text":"RT @socialistudents: Vlad - we go to NUS conference not just as individuals but as members of Socialist Students #SocStu17",
      "created_at":"Sun Feb 12 19:28:34 +0000 2017",
      "hashtags":[
         "hashtag1",
         "hashtag2"
      ],
      "subjectivity":0.0,
      "retweet_recount":4,
      "id":830861171582439424,
      "favorite_count":0
}
Gavin
  • 2,784
  • 6
  • 41
  • 78

1 Answers1

6

Well if you're talking about the default field types that are made when you use Solr's default schema, if you actually look at the fieldType definition it says this:

<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />
<fieldType name="strings" class="solr.StrField" sortMissingLast="true" multiValued="true" docValues="true" />

Edited: The 2nd example should be strings instead of string

So they actually have the same class (solr's default string class solr.StrField) so they are the same type of data. The only difference is 'strings' is multivalued, which just means you can store multiple discrete values in the one field.

In your example, it seems that your hashtags data is just an array of individual hashtag values, so since you want to store multiple discrete strings in one field then 'strings' would be the choice as it is multiValued.

Gavin
  • 2,784
  • 6
  • 41
  • 78
Jayce444
  • 8,725
  • 3
  • 27
  • 43
  • 2
    The confusion was when defining the `field name`. Whereby you can declare the attribute `multiValued` as such ``. Well, they turn out to be the same. Just a little confusing and makes the `multiValued` during `field name` redundant. Since `multiValued` will be determined by the `field type` – Gavin Feb 13 '17 at 11:09
  • Yes correct, sorry I missed the 's' on the second one. Fixed – Jayce444 Feb 13 '17 at 11:11
  • Yeah you can declare 'multivalued' on the field as well. So given your hashtags data structure you can put `multiValued="true"` on your field definition – Jayce444 Feb 13 '17 at 11:13