2

when I'm trying to save big list of entities (77832 elements) in PostgreSQL. But after performing "saveAll" method there are only 49207 entries in table (table was empty before adding items). According to debugger list size doesn't change. During saving data there are no errors in application and database log.

Here is entity classes:

@Getter
@Setter
@Entity
@Table(name = "faction")
@NoArgsConstructor
@AllArgsConstructor
public class Faction {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(name = "name", unique = true, nullable = false)
    private String name;

    @ManyToOne(cascade = CascadeType.MERGE, optional = false)
    @JoinColumn(name = "allegiance_id", nullable = false)
    private Allegiance allegiance;

    @ManyToOne(cascade = CascadeType.MERGE, optional = false)
    @JoinColumn(name = "government_id", nullable = false)
    private Government government;

    @Column(name = "is_player_faction", nullable = false)
    private Boolean isPlayerFaction;
}
@Entity
@Table(name = "allegiance")
@Getter
@Setter
@ToString
@NoArgsConstructor
@AllArgsConstructor
public class Allegiance {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(name = "name", unique = true, nullable = false)
    private String name;
}

And method which implements saving data logic:

public List<FactionDto> saveFactions(List<FactionDto> factionDtos) {
        var factions = factionDtos.stream()
                                  .map(factionMapper::toEntity)
                                  .toList();

        var governments = factionDtos.stream()
                                     .map(FactionDto::getGovernment)
                                     .collect(Collectors.toSet())
                                     .stream()
                                     .map(item -> new Government(null, item.getName()))
                                     .collect(Collectors.toSet());
        Map<String, Government> governmentMap = governmentRepository
                .saveAll(governments)
                .stream()
                .collect(Collectors.toMap(Government::getName, item -> item));

        var allegiances = factionDtos.stream()
                                     .map(FactionDto::getAllegiance)
                                     .collect(Collectors.toSet())
                                     .stream()
                                     .map(item -> new Allegiance(null, item.getName()))
                                     .collect(Collectors.toSet());
        Map<String, Allegiance> allegianceMap = allegianceRepository
                .saveAll(allegiances)
                .stream()
                .collect(Collectors.toMap(Allegiance::getName, allegiance -> allegiance));

        factions = factions.stream()
                           .peek(faction -> {
                               var allegiance = allegianceMap.get(faction.getAllegiance().getName());
                               faction.setAllegiance(allegiance);
                               var government = governmentMap.get(faction.getGovernment().getName());
                               faction.setGovernment(government);
                           })
                           .collect(Collectors.toList());


        return factionRepository.saveAll(factions).stream()
                                .map(factionMapper::toDto)
                                .toList();
    }

Debugger shows there are exactly 77832 elements in collection passed for saving. There are no duplicates

enter image description here

In my opinion there are should be same number of entries created or at least error message if there are conflicts

BANTYC
  • 151
  • 1
  • 9
  • UPD: changing bulk save to ```factions.forEach(factionRepository::save);``` made saving longer but there are at least all entries present. However I still have no idea why bulk save works this way – BANTYC Mar 07 '23 at 14:41
  • This way stopped working today. Now storing 49208/77832 – BANTYC Mar 09 '23 at 07:27
  • Try changing `peek(faction ->{XXX})` to `map(faction -> {XXX; return faction;})` – Bohemian Mar 09 '23 at 18:30
  • @Bohemian, already changed to that way and no result – BANTYC Mar 09 '23 at 19:37
  • I see that the list is converted into a set. Could it be that by removing duplicates, the set gets shorter that the list? – Maurice Perry Mar 13 '23 at 07:50

3 Answers3

2

The faction.name values contain duplicates. The name column is unique so only the unique names remain. Test it by doing the following immediately before the save.

var uniqueFactionByName = factions.stream()
                                     .map(faction -> faction.name)
                                     .collect(Collectors.toSet())
System.out.println(uniqueFactionByName.size());
John Williams
  • 4,252
  • 2
  • 9
  • 18
  • 1
    This is resource from website with all factions present in game. There are only unique names and moreover some names are missing after saving and SQL query returned 0 name duplicates. Entity field "name" marked as unique and if were duplicated values hibernate would throw an exception – BANTYC Mar 10 '23 at 05:40
1

changing bulk save to factions.forEach(factionRepository::save); made saving longer but there are at least all entries present. However I still have no idea why bulk save works this way

This works because your business method is not annotated with @Transactional and thus you now save each Faction object in a separate transaction. See the code of SimpleJpaRepository encapsulating the basic functionality for Spring Data's JpaRepository:

//one transaction for all items
@Transactional
public <S extends T> List<S> saveAll(Iterable<S> entities) {
    Assert.notNull(entities, "Entities must not be null!");
    List<S> result = new ArrayList();
    Iterator var3 = entities.iterator();

    while(var3.hasNext()) {
        S entity = var3.next();
        result.add(this.save(entity));
    }

    return result;
}

//a separate transaction for each item
@Transactional
public <S extends T> S save(S entity) {
    Assert.notNull(entity, "Entity must not be null.");
    if (this.entityInformation.isNew(entity)) {
        this.em.persist(entity);
        return entity;
    } else {
        return this.em.merge(entity);
    }
}

As you see JpaRepository.saveAll() just calls JpaRepository.save() in a loop, so the logic is the same but the only difference is about having 1 transaction instead of n.

Now, why some of the items are lost. The cause is @GeneratedValue(strategy = GenerationType.IDENTITY) which is not supported by PostgreSQL, see https://vladmihalcea.com/hibernate-identity-sequence-and-table-sequence-generator/.

With GenerationType.IDENTITY you insert a row into your table without ID and your DBMS assigns it automatically, so you have no control over it. If DBMS for some reason detects two rows as duplicated within one and the same transaction it might squash them into one and I suspect this is what happens in this case.

I suggest you to use sequence for primary key generation, this would solve your problem. If sequence cannot be applied in your case then try to split entire collection of saved Faction objects into smaller chunks, e.g. of size 500, something like:

Lists.partition(factions, 500)
  .stream()
  .map(factionRepository::saveAll)
  .flatMap(List::stream)
  .map(factionMapper::toDto)
  .toList();

You'll still have n/500 transactions instead of 1, but this could help. The correct solution, however, is to use sequence.

Sergey Tsypanov
  • 3,265
  • 3
  • 8
  • 34
  • *"which is not supported by PostgreSQL"* It's perfectly possible to use `GenerationType.IDENTITY` with PostgreSQL, as explained [here](https://stackoverflow.com/questions/40497768/jpa-and-postgresql-with-generationtype-identity). – Olivier Mar 13 '23 at 08:19
  • It's possible in case you've declared id column as SERIAL. We don't have DDL in this question and as soon as item-by-item insert works I guess the id might be SERIAL. So again we come to the root issue: we are not in control of assigning ids which apparently works differently depending on whether we are saving in bulk or one-by-one. – Sergey Tsypanov Mar 13 '23 at 08:57
1

I guess you didn't implement FactionRepository.saveAll yourself but rely on the Spring Data default? Spring Data doesn't know if save should map to persist or merge, so it tries to understand what to do by looking at the entity e.g. if the entity has a PK set, it will assume merge. I don't know if the version you are using has some other strategies to determine this, but I think that could be the source of your problem.

Are you certain that the objects do not contain a value for the id attribute i.e. factions.stream().noneMatch(f -> f.getId() != null)?

With all that DTO transformation code though, it's hard to see what's going on. I also guess that you just posted an excerpt and not the actual code? So there are further possibly mismatches which makes it hard for us to help you, as the bug could lie somewhere in the code you are not showing.

Either way, I think this is a perfect use case for Blaze-Persistence Entity Views.

I created the library to allow easy mapping between JPA models and custom interface or abstract class defined models, something like Spring Data Projections on steroids. The idea is that you define your target structure(domain model) the way you like and map attributes(getters) via JPQL expressions to the entity model.

A DTO model for your use case could look like the following with Blaze-Persistence Entity-Views:

@EntityView(Faction.class)
@CreatableEntityView
public interface FactionDto {
    @IdMapping
    Long getId();
    String getName();
    void setName(String name);
    Boolean getIsPlayerFaction();
    void setIsPlayerFaction(Boolean isPlayerFaction);
    GovernmentDto getGovernment();
    void setGovernment(GovernmentDto government);
    AllegianceDto getAllegiance();
    void setAllegiance(AllegianceDto allegiance);

    @EntityView(Allegiance.class)
    @CreatableEntityView
    interface AllegianceDto {
        @IdMapping
        Long getId();
        String getName();
        void setName(String name);
    }

    @EntityView(Government.class)
    @CreatableEntityView
    interface GovernmentDto {
        @IdMapping
        Long getId();
        String getName();
        void setName(String name);
    }
}

Querying is a matter of applying the entity view to a query, the simplest being just a query by id.

FactionDto a = entityViewManager.find(entityManager, FactionDto.class, id);

The Spring Data integration allows you to use it almost like Spring Data Projections: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-features

Page<FactionDto> findAll(Pageable pageable);

The best part is, it will only fetch the state that is actually necessary!

Saving is also supported and as simple as defining and using this method in a repository:

void saveAll(List<FactionDto> dtos);
Christian Beikov
  • 15,141
  • 2
  • 32
  • 58