I have two entities, Etf
and DataPoint
. Each Etf
has multiple DataPoints
. See structure below
public class Etf
{
public Etf()
{
DataPoints = new HashSet<DataPoint>();
}
public string Id { get; set; }
public ICollection<DataPoint> DataPoints { get; private set; }
}
public class DataPoint {
public string EtfId { get; set; }
public Etf Etf { get; set; }
public DateTime Date { get; set; }
}
It happens that multiple datapoints with the same EtfId
and Date
are inserted into the database. I would like to remove the duplicates based on these two fields.
In SQL, I have tried this:
WITH CTE AS
(SELECT Id, ROW_NUMBER() OVER
(PARTITION BY EtfId, Date ORDER BY EtfId, Date DESC)
AS ROW_NUMBER FROM DataPoints)
DELETE FROM CTE WHERE ROW_NUMBER > 1;
Which gives me an error that CTE is not an updatable table (perhaps since it is connected to both Etf
as well as DataPoint
). I then tried, based on this answer, the following in LINQ:
// Get the duplicate indices
var idxs = await _context.DataPoints
.GroupBy(g => new {g.EtfId, g.Date})
.OrderBy(g => g.Key.EtfId)
.ThenByDescending(g => g.Key.Date)
.SelectMany(g =>
g.Select((i, idx) => new { i.Id, Idx = idx }))
.Where(g => g.Idx > 0)
.Select(g => g.Id)
.ToListAsync(cancellationToken);
// Get the duplicate entities from indices
var duplicates = await _context.DataPoints
.Where(x => idxs.Contains(x.Id))
.ToListAsync(cancellationToken);
// Remove them
_context.DataPoints.RemoveRange(duplicates);
However, this approach gives me a System.InvalidOperationException
at the Select((i, idx) => ..)
statement saying it may be a bug or limitation from EF Core.
Is there any better (or just working) method that I can use?