0

My question

I have an application generate a list, which contains multiple layers of nested lists. My current implementation generates the desired results, but it also executes lots of query, exponential to the number of layers depth, to the database. So the overall performance is pretty poor (taking too much time). I'm trying to find a way to improve it but, to tell the true, I don't know which way is the better way.

So what I would like to know are:

  1. I have an improvement. Is my improvement acceptable? (won't create other issue)

  2. Is there any other way to improve, like:

    a. denormalize database tables

    b. use other ORM framework with better support

    c. Other better implementations for database design or for java code (could really use some help from you guys)

The original need

What I have are some tables with one-to-many relation:

Family
id       name
1        Smiths
2        Johnson
3        Williams

Member
id       family_id  name
1        1          David Smiths
2        1          Mary Smiths
3        1          William Smiths
3        2          David Johnson
4        3          David Williams

Asset
id       member_id  type     value
1        1          cash     100.00
2        1          share    200.00
3        1          security 100.00
4        2          cash     50.00

What I need is to generate a list of Families and their Asset. This should be a list of Family, each Family contains a list of Member, and each Member contains a list of Asset:

class Family{
    Integer id;
    String name;
    List<Member> members;
    // Getter and setters
}

class Member{
    Integer id;
    Integer family_id;
    String name;
    List<Asset> assets;
    // Getter and setters
}

class Asset{
    Integer id;
    Integer member_id;
    String type;
    BigDecimal value;
}

The result should be List<Family> families. Output to JSON:

{    
"families": [{
    "id": 1,
    "name": "Smiths",
    "members": [{
        "id": 1,
        "family_id": 1,
        "name": "David Smiths",
        "assets": [{
            "id": 1,
            "member_id": 1,
            "type": "cash",
            "value": "100.00"
        }, {
            "id": 2,
            "member_id": 1,
            "type": "share",
            "value": "200.00"
        }, {
            "id": 3,
            "member_id": 1,
            "type": "security",
            "value": "100.00"
        }]
    }]
}]
}

So far, I got two ways to do this:

Solution 1 Nested Loop Query

Java code

List<Family> generateFamilyList(){
    List<Family> families = resultMapper.selectFamily();

    // select Member for each Family
    for(Family family: families){

        List<Member> membersInFamily = resultMapper.selectMemberByFamily(family.getId());

        //select Asset for each Member
        for(Member member: membersInFamily){

            List<Asset> assetsOfMember = resultMapper.selectAssetByMember(member.getId());
            member.setAssets(assetsOfMember);
        }
        family.setMembers(membersInFamily );
    }

    return families;
}

And statements, using Mybatis.

<select id="selectFamily" resultType="Family">
    select id, name from family;
</select>

<select id="selectMemberByFamily" resultType="Member">
    select id, family_id, name from member
    where family_id = #{familyId};
</select>

<select id="selectAssetByMember" resultType="Asset">
    select id, member_id, type, value from asset
    where member_id = #{memberId};
</select>

This method does its job and generates correct JSON. But it also creates N^3 times of query to the database at the inner most select selectAssetByMember(member.getId())

I switch to use Mybatis collection select in the mapper, but found out this only make Mybatis Mapper do the for-loop query instead of doing in java code. The database still receives N^3 query.

So I came up with another solution:

Solution 2 Select all in a query, then filtered in Java

Java code

List<Family> generateFamilyList(){
    List<Family> families = resultMapper.selectFamily();

    List<Integer> allFamilyIds = familes.stream()
        .map(Family::getId)
        .collect(Collectors.toList());

    if(familyIds.isEmpty())
        //throw exception or return 

    // select all members in one query.
    // !Watch out! for Max Number of Allowable Parameters
    List<Member> allMembers = resultMapper.selectMemberByFamilyIds(allFamilyIds);

    List<Integer> allMemberIds = allMembers.stream()
        .map(Member::getId)
        .collect(Collectors.toList());

    if(allMemberIds.isEmpty())
        //throw exception or 
        //return a new empty ArrayList for this member's asset

    // select all ssets in one query.
    // !Watch out! for Max Number of Allowable Parameters
    List<Asset> allAssets = resultMapper.selectAssetByMemberIds(allMemberIds );

    // filter and set Members for each Family       
    for(Family family: families){

        List<Member> membersInFamily = allMembers.stream()
            .filter(member -> member.getFamilyId().equals(family.getId()))
            .collect(Collectors.toList());

        //select Asset for each Member
        for(Member member: membersInFamily){

            List<Asset> assetsOfMember = allAssets.stream()
                .filter(asset -> asset.getMemberId().equals(member.getId()))
                .collect(Collectors.toList());

            member.setAssets(assetsOfMember);
        }
        family.setMembers(membersInFamily );
    }

    return families;
}

And statements. The mybatis support a list of parameters:

<select id="selectFamily" resultType="Family">
    select id, name from family;
</select>

<select id="selectMemberByFamilyIds" resultType="Member">
    select id, family_id, name from member
    where family_id IN 
    <foreach> #{familyId} </foreach>;  
    <!-- ( 1, 2, 3, ....); the input list familyIds-->
    <!-- could cause SQLException if the ids more than the Max Number of Allowable Parameters of database -->
</select>

<select id="selectAssetByMemberIds" resultType="Asset">
    select id, member_id, type, value from asset
    where member_id IN
    <foreach> #{memberId} </foreach>
    <!-- ( 1, 2, 3, ....); the input list memberIds-->
    <!-- could cause SQLException if the ids more than the Max Number of Allowable Parameters of database -->
</select>

Result and Issues

  1. Query Number Reduced

  2. Memory usage in Java application Increased

  3. Overall application response time remarkably reduced(down to 20% - 10% in my cases)

  4. New Issue: Max Number of Allowable Parameters. Need to do paging before query if the parameter list memberIds size too big. The max number is limited and differs from types of database.

That's all I got for now. And I'm kind of stuck, don't know where should I improve my codes.

Thank you guys for watching my long long questions.

Community
  • 1
  • 1
Che-Chia Chang
  • 291
  • 2
  • 10
  • 1
    If you need best performance.. you can generate JSON itself within the DB. See examples here. https://dev.mysql.com/doc/refman/5.7/en/json.html – Chand Priyankara Jan 14 '17 at 02:38

2 Answers2

2

It really depends on how much data we're talking about, where its kept etc.

For really big data solution, we're prob saying something like a graph database in cassandra, connected via akka.

For medium load, the db will be on its own server, so you'll want to reduce the amount of connects and network use, so a single call and sort locally is best,

for really small data, where app and db is on the same server, multiple call to the db is ok and the db will be better and faster at sorting.

1

If you really want best performance, your filtering should go deep near the where data is. That means use your DB's capabilities to max. For example MySQL supports JSON_OBJECT to create JSON results. Ex:

SELECT
    member.id,
    member.name,
    member.family_id,
    JSON_OBJECT('id', asset.id,'member_id', asset.member_id, 'type', asset.type, 'value',
    asset.value) AS assets
FROM
    member
LEFT OUTER JOIN
    asset
ON
    member.id = asset.member_id

will give you following results. You can fine tune further.

1   David Smiths    1   {"id": 1, "type": "cash", "value": 100, "member_id": 1}
1   David Smiths    1   {"id": 2, "type": "share", "value": 200, "member_id": 1}
1   David Smiths    1   {"id": 3, "type": "security", "value": 100, "member_id": 1}
2   Mary Smiths 1   {"id": 4, "type": "cash", "value": 50, "member_id": 2}
3   William Smiths  1   {"id": null, "type": null, "value": null, "member_id": null}
4   David Johnson   2   {"id": null, "type": null, "value": null, "member_id": null}
5   David Williams  3   {"id": null, "type": null, "value": null, "member_id": null}

Hope this helps.

Chand Priyankara
  • 6,739
  • 2
  • 40
  • 63
  • 1
    Your answer helps a lot. I'm inspired. I improved my solutions not only by reducing the query numbers but by `Making every query count`. Now my query number down from N^3 to N (N is the number of family). The overall response time down to less than 10%. Thank you soo much, sir! – Che-Chia Chang Jan 15 '17 at 02:19