I am facing an issue with my SOLR (version 8.5) document structure. Currently, I have an array of strings with delimited values in my document, and I'd like to transform it into an array of objects for easier querying.
Original Document Structure:
{
"ID": "123",
"AnotherProperty": "Value",
"PersonalInfo": [
"PersonalName|Prop1|Prop2|Prop3|Age|Weight",
"PersonalName|Prop1|Prop2|Prop3|Age|Weight"
]
}
Desired Document Structure:
{
"ID": "123",
"AnotherProperty": "Value",
"PersonalInfo": [
{
"PersonalName": "Value",
"Prop1": "Value",
"Prop2": "Value",
"Prop3": "Value",
"Age": "Value",
"Weight": "Value"
},
{
"PersonalName": "Value",
"Prop1": "Value",
"Prop2": "Value",
"Prop3": "Value",
"Age": "Value",
"Weight": "Value"
}
]
}
I have attempted a solution using a custom transformer and updating the schema, but the resulting documents seem to have the transformed structure as strings.
Transformer script in data-config.xml:
function CombinePersonalInfoFields(row) {
var personalFieldsArr = new java.util.ArrayList();
for (var i = 0; i < row.get("PersonalInfo").size(); i++) {
var personalFieldObj = new org.apache.solr.common.util.SimpleOrderedMap();
personalFieldObj.add("PersonalName", row.get("PersonalName").get(i));
personalFieldObj.add("Prop1", row.get("Prop1").get(i));
personalFieldObj.add("Prop2", row.get("Prop2").get(i));
personalFieldObj.add("Age", row.get("Age").get(i));
personalFieldObj.add("Weight", row.get("Weight").get(i));
personalFieldsArr.add(personalFieldObj);
}
row.put("Personal_Fields", personalFieldsArr);
return row;
}
function ProcessAllRows(row) {
// Other logic here
CombinePersonalInfoFields(row);
return row;
}
<entity name="Document" transformer="script:ProcessAllRows"
pk="ID"
query="[dbo].[DATA_Pull] '${dih.request.FullLoad}', '${dih.last_index_time}'}'">
<!-- Existing field mappings -->
</entity>
Adding a new field type in the managed-schema.xml file:
<fieldType name="child_document" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
<field name="Personal_Fields" type="child_document" multiValued="true" indexed="true" stored="true"/>
After making these changes, the documents I'm getting have the desired object structure, but it seems the objects are stored as strings, likely due to using the TextField for the child_document field type.
Resulting Structure after applying the above solution:
{
"ID":"123",
"AnotherProperty": "Value"
PersonalInfo:["{PersonalName:Value,Prop1: Value, Prop2: Value,Prop3: Value, Age: Value,Weight: Value"}",
"{PersonalName:Value,Prop1: Value, Prop2: Value,Prop3: Value, Age: Value,Weight: Value"}"]
}
I'd appreciate any guidance on what I might be missing or doing wrong here, and how to achieve the correct object structure so that I can query based on the child object fields.