3

My team uses TFS 2015 as an ALM and Version Control system,I want to analyse which files change most frequently.

I found TFS didn't have this functionality out of the box, but TFS2015 has a REST API to query Changesets for files, like this below:

http://{instance}/tfs/DefaultCollection/_apis/tfvc/changesets?searchCriteria.itemPath={filePath}&api-version=1.0

There are thousands of files in my Project Repository, querying it one by one is not a good idea, are there any better solution to this question?

Elmar
  • 1,236
  • 1
  • 11
  • 16
Allen
  • 569
  • 9
  • 22

1 Answers1

6

I don't think there's a defacto out of the box solution for your question, I've tried two separate approaches to solve your question, I initially focused on the REST API but later switched to the SOAP API to see what features are supported in it.

In all options below the following api should suffice:

Install the client API link @NuGet

Install-Package Microsoft.TeamFoundationServer.ExtendedClient -Version 14.89.0 or later

In all options the following extension method is required ref

    public static class StringExtensions
   {
       public static bool ContainsAny(this string source, List<string> lookFor)
       {
           if (!string.IsNullOrEmpty(source) && lookFor.Count > 0)
           {
               return lookFor.Any(source.Contains);
           }
           return false;
       }
   }

OPTION 1: SOAP API

With the SOAP API one is not explicitly required to limit the number of query results using the maxCount parameter as described in this excerpt of QueryHistory method's IntelliSense documentation:

maxCount: This parameter allows the caller to limit the number of results returned. QueryHistory pages results back from the server on demand, so limiting your own consumption of the returned IEnumerable is almost as effective (from a performance perspective) as providing a fixed value here. The most common value to provide for this parameter is Int32.MaxValue.

Based on the maxCount documentation I made a decision to extract statistics for each of the products in my source control system since it may be of great value to see how much code flux there is for each system in the codebase independent of each other instead of limiting to 10 files across the entire codebase which could contain hundreds of systems.

C# REST and SOAP (ExtendedClient) api reference

Install the SOAP API Client link @NuGet

Install-Package Microsoft.TeamFoundationServer.ExtendedClient -Version 14.95.2

limiting criteria are: Only scan specific paths in source control since some systems in source control are older and possibly only there for historic purposes.

  1. only certain file extensions included e.g. .cs, .js
  2. certain filenames excluded e.g. AssemblyInfo.cs.
  3. items extracted for each path: 10
  4. from date: 120 days ago
  5. to date: today
  6. exclude specific paths e.g. folders containing release branches or archived branches
using Microsoft.TeamFoundation.Client;
using Microsoft.TeamFoundation.VersionControl.Client;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
public void GetTopChangedFilesSoapApi()
    {
        var tfsUrl = "https://<SERVERNAME>/tfs/<COLLECTION>";
        var domain = "<DOMAIN>";
        var password = "<PASSWORD>";
        var userName = "<USERNAME>";

        //Only interested in specific systems so will scan only these
        var directoriesToScan = new List<string> {
            "$/projectdir/subdir/subdir/subdirA/systemnameA",
            "$/projectdir/subdir/subdir/subdirB/systemnameB",
            "$/projectdir/subdir/subdir/subdirC/systemnameC",
            "$/projectdir/subdir/subdir/subdirD/systemnameD"
            };

        var maxResultsPerPath = 10;
        var fromDate = DateTime.Now.AddDays(-120);
        var toDate = DateTime.Now;

        var fileExtensionToInclude = new List<string> { ".cs", ".js" };
        var extensionExclusions = new List<string> { ".csproj", ".json", ".css" };
        var fileExclusions = new List<string> { "AssemblyInfo.cs", "jquery-1.12.3.min.js", "config.js" };
        var pathExclusions = new List<string> {
            "/subdirToForceExclude1/",
            "/subdirToForceExclude2/",
            "/subdirToForceExclude3/",
        };

        using (var collection = new TfsTeamProjectCollection(new Uri(tfsUrl), 
            new NetworkCredential(userName: userName, password: password, domain: domain)))
        {
            collection.EnsureAuthenticated();

            var tfvc = collection.GetService(typeof(VersionControlServer)) as VersionControlServer;

            foreach (var rootDirectory in directoriesToScan)
            {
                //Get changesets
                //Note: maxcount set to maxvalue since impact to server is minimized by linq query below
                var changeSets = tfvc.QueryHistory(path: rootDirectory, version: VersionSpec.Latest,
                    deletionId: 0, recursion: RecursionType.Full, user: null,
                    versionFrom: new DateVersionSpec(fromDate), versionTo: new DateVersionSpec(toDate),
                    maxCount: int.MaxValue, includeChanges: true,
                    includeDownloadInfo: false, slotMode: true)
                    as IEnumerable<Changeset>;

                //Filter changes contained in changesets
                var changes = changeSets.SelectMany(a => a.Changes)
                .Where(a => a.ChangeType != ChangeType.Lock || a.ChangeType != ChangeType.Delete || a.ChangeType != ChangeType.Property)
                .Where(e => !e.Item.ServerItem.ContainsAny(pathExclusions))
                .Where(e => !e.Item.ServerItem.Substring(e.Item.ServerItem.LastIndexOf('/') + 1).ContainsAny(fileExclusions))
                .Where(e => !e.Item.ServerItem.Substring(e.Item.ServerItem.LastIndexOf('.')).ContainsAny(extensionExclusions))
                .Where(e => e.Item.ServerItem.Substring(e.Item.ServerItem.LastIndexOf('.')).ContainsAny(fileExtensionToInclude))
                .GroupBy(g => g.Item.ServerItem)
                .Select(d => new { File=d.Key, Count=d.Count()})
                .OrderByDescending(o => o.Count)
                .Take(maxResultsPerPath);

                //Write top items for each path to the console
                Console.WriteLine(rootDirectory); Console.WriteLine("->");
                foreach (var change in changes)
                {
                    Console.WriteLine("ChangeCount: {0} : File: {1}", change.Count, change.File);
                }
                Console.WriteLine(Environment.NewLine);
            }
        }
    }

OPTION 2A: REST API

(!! problem identified by OP led to finding a critical defect in v.xxx-14.95.4 of api) - OPTION 2B is the workaround

defect discovered in v.xxx to 14.95.4 of api: The TfvcChangesetSearchCriteria type contains an ItemPath property which is supposed to limit the search to a specified directory. The default value of this property is $/, unfortunately when used GetChangesetsAsync will always use the root path of the tfvc source repository irrespective of the value set.

That said, this will still be a reasonable approach if the defect were to be fixed.

One way to limit the impact to your scm system would be to specify limiting criteria for the query/s using the TfvcChangesetSearchCriteria Type parameter of the GetChangesetsAsync member in the TfvcHttpClient type.

You do not particularly need to check each file in your scm system/project individually, checking the changesets for the specified period may be enough. Not all of the limiting values I used below are properties of the TfvcChangesetSearchCriteria type though so I've written a short example to show how I would do it i.e. you can specify the maximum number of changesets to consider initially and the specific project you want to look at.

Note: The TheTfvcChangesetSearchCriteria type contains some additional properties that you may want to consider using.

In the example below I've used the REST API in a C# client and getting results from tfvc.
If your intention is to use a different client language and invoke the REST services directly e.g. JavaScript; the logic below should still give you some pointers.

//targeted framework for example: 4.5.2
using Microsoft.TeamFoundation.SourceControl.WebApi;
using Microsoft.VisualStudio.Services.Client;
using Microsoft.VisualStudio.Services.Common;

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
public async Task GetTopChangedFilesUsingRestApi()
    {
        var tfsUrl = "https://<SERVERNAME>/tfs/<COLLECTION>";
        var domain = "<DOMAIN>";
        var password = "<PASSWORD>";
        var userName = "<USERNAME>";

        //Criteria used to limit results
        var directoriesToScan = new List<string> {
            "$/projectdir/subdir/subdir/subdirA/systemnameA",
            "$/projectdir/subdir/subdir/subdirB/systemnameB",
            "$/projectdir/subdir/subdir/subdirC/systemnameC",
            "$/projectdir/subdir/subdir/subdirD/systemnameD"
        };

        var maxResultsPerPath = 10;
        var fromDate = DateTime.Now.AddDays(-120);
        var toDate = DateTime.Now;

        var fileExtensionToInclude = new List<string> { ".cs", ".js" };
        var folderPathsToInclude = new List<string> { "/subdirToForceInclude/" };
        var extensionExclusions = new List<string> { ".csproj", ".json", ".css" };
        var fileExclusions = new List<string> { "AssemblyInfo.cs", "jquery-1.12.3.min.js", "config.js" };
        var pathExclusions = new List<string> {
            "/subdirToForceExclude1/",
            "/subdirToForceExclude2/",
            "/subdirToForceExclude3/",
        };

        //Establish connection
        VssConnection connection = new VssConnection(new Uri(tfsUrl),
            new VssCredentials(new Microsoft.VisualStudio.Services.Common.WindowsCredential(new NetworkCredential(userName, password, domain))));

        //Get tfvc client
        var tfvcClient = await connection.GetClientAsync<TfvcHttpClient>();

        foreach (var rootDirectory in directoriesToScan)
        {
            //Set up date-range criteria for query
            var criteria = new TfvcChangesetSearchCriteria();
            criteria.FromDate = fromDate.ToShortDateString();
            criteria.ToDate = toDate.ToShortDateString();
            criteria.ItemPath = rootDirectory;

            //get change sets
            var changeSets = await tfvcClient.GetChangesetsAsync(
                maxChangeCount: int.MaxValue,
                includeDetails: false,
                includeWorkItems: false,
                searchCriteria: criteria);

            if (changeSets.Any())
            {
                var sample = new List<TfvcChange>();

                Parallel.ForEach(changeSets, changeSet =>
                {
                    sample.AddRange(tfvcClient.GetChangesetChangesAsync(changeSet.ChangesetId).Result);
                });

                //Filter changes contained in changesets
                var changes = sample.Where(a => a.ChangeType != VersionControlChangeType.Lock || a.ChangeType != VersionControlChangeType.Delete || a.ChangeType != VersionControlChangeType.Property)
                .Where(e => e.Item.Path.ContainsAny(folderPathsToInclude))
                .Where(e => !e.Item.Path.ContainsAny(pathExclusions))
                .Where(e => !e.Item.Path.Substring(e.Item.Path.LastIndexOf('/') + 1).ContainsAny(fileExclusions))
                .Where(e => !e.Item.Path.Substring(e.Item.Path.LastIndexOf('.')).ContainsAny(extensionExclusions))
                .Where(e => e.Item.Path.Substring(e.Item.Path.LastIndexOf('.')).ContainsAny(fileExtensionToInclude))
                .GroupBy(g => g.Item.Path)
                .Select(d => new { File = d.Key, Count = d.Count() })
                .OrderByDescending(o => o.Count)
                .Take(maxResultsPerPath);

                //Write top items for each path to the console
                Console.WriteLine(rootDirectory); Console.WriteLine("->");
                foreach (var change in changes)
                {
                    Console.WriteLine("ChangeCount: {0} : File: {1}", change.Count, change.File);
                }
                Console.WriteLine(Environment.NewLine);
            }
        }
    }

OPTION 2B

Note: This solution is very similar to OPTION 2A with the exception of a workaround implemented to fix a limitation in the REST client API library at time of writing. Brief summary - instead of invoking client api library to get changesets this example uses a web request direct to the REST API to fetch changesets, thus additional types were needed to be defined to handle the response from the service.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Threading.Tasks;

using Microsoft.TeamFoundation.SourceControl.WebApi;
using Microsoft.VisualStudio.Services.Client;
using Microsoft.VisualStudio.Services.Common;

using System.Text;
using System.IO;
using Newtonsoft.Json;
public async Task GetTopChangedFilesUsingDirectWebRestApiSO()
    {
        var tfsUrl = "https://<SERVERNAME>/tfs/<COLLECTION>";
        var domain = "<DOMAIN>";
        var password = "<PASSWORD>";
        var userName = "<USERNAME>";

        var changesetsUrl = "{0}/_apis/tfvc/changesets?searchCriteria.itemPath={1}&searchCriteria.fromDate={2}&searchCriteria.toDate={3}&$top={4}&api-version=1.0";

        //Criteria used to limit results
        var directoriesToScan = new List<string> {
            "$/projectdir/subdir/subdir/subdirA/systemnameA",
            "$/projectdir/subdir/subdir/subdirB/systemnameB",
            "$/projectdir/subdir/subdir/subdirC/systemnameC",
            "$/projectdir/subdir/subdir/subdirD/systemnameD"
        };

        var maxResultsPerPath = 10;
        var fromDate = DateTime.Now.AddDays(-120);
        var toDate = DateTime.Now;

        var fileExtensionToInclude = new List<string> { ".cs", ".js" };
        var folderPathsToInclude = new List<string> { "/subdirToForceInclude/" };
        var extensionExclusions = new List<string> { ".csproj", ".json", ".css" };
        var fileExclusions = new List<string> { "AssemblyInfo.cs", "jquery-1.12.3.min.js", "config.js" };
        var pathExclusions = new List<string> {
            "/subdirToForceExclude1/",
            "/subdirToForceExclude2/",
            "/subdirToForceExclude3/",
        };

        //Get tfvc client
        //Establish connection
        VssConnection connection = new VssConnection(new Uri(tfsUrl),
            new VssCredentials(new Microsoft.VisualStudio.Services.Common.WindowsCredential(new NetworkCredential(userName, password, domain))));

        //Get tfvc client
        var tfvcClient = await connection.GetClientAsync<TfvcHttpClient>();

        foreach (var rootDirectory in directoriesToScan)
        {
            var changeSets = Invoke<GetChangeSetsResponse>("GET", string.Format(changesetsUrl, tfsUrl, rootDirectory,fromDate,toDate,maxResultsPerPath), userName, password, domain).value;

            if (changeSets.Any())
            {
                //Get changes
                var sample = new List<TfvcChange>();
                foreach (var changeSet in changeSets)
                {
                    sample.AddRange(tfvcClient.GetChangesetChangesAsync(changeSet.changesetId).Result);
                }

                //Filter changes
                var changes = sample.Where(a => a.ChangeType != VersionControlChangeType.Lock || a.ChangeType != VersionControlChangeType.Delete || a.ChangeType != VersionControlChangeType.Property)
                .Where(e => e.Item.Path.ContainsAny(folderPathsToInclude))
                .Where(e => !e.Item.Path.ContainsAny(pathExclusions))
                .Where(e => !e.Item.Path.Substring(e.Item.Path.LastIndexOf('/') + 1).ContainsAny(fileExclusions))
                .Where(e => !e.Item.Path.Substring(e.Item.Path.LastIndexOf('.')).ContainsAny(extensionExclusions))
                .Where(e => e.Item.Path.Substring(e.Item.Path.LastIndexOf('.')).ContainsAny(fileExtensionToInclude))
                .GroupBy(g => g.Item.Path)
                .Select(d => new { File = d.Key, Count = d.Count() })
                .OrderByDescending(o => o.Count)
                .Take(maxResultsPerPath);

                //Write top items for each path to the console
                Console.WriteLine(rootDirectory); Console.WriteLine("->");
                foreach (var change in changes)
                {
                    Console.WriteLine("ChangeCount: {0} : File: {1}", change.Count, change.File);
                }
                Console.WriteLine(Environment.NewLine);
            }
        }
    }

    private T Invoke<T>(string method, string url, string userName, string password, string domain)
    {
        var request = WebRequest.Create(url);
        var httpRequest = request as HttpWebRequest;
        if (httpRequest != null) httpRequest.UserAgent = "versionhistoryApp";
        request.ContentType = "application/json";
        request.Method = method;

        request.Credentials = new NetworkCredential(userName, password, domain); //ntlm 401 challenge support
        request.Headers[HttpRequestHeader.Authorization] = "Basic " + Convert.ToBase64String(Encoding.UTF8.GetBytes(domain+"\\"+userName + ":" + password)); //basic auth support if enabled on tfs instance

        try
        {
            using (var response = request.GetResponse())
            using (var responseStream = response.GetResponseStream())
            using (var reader = new StreamReader(responseStream))
            {
                string s = reader.ReadToEnd();
                return Deserialize<T>(s);
            }
        }
        catch (WebException ex)
        {
            if (ex.Response == null)
                throw;

            using (var responseStream = ex.Response.GetResponseStream())
            {
                string message;
                try
                {
                    message = new StreamReader(responseStream).ReadToEnd();
                }
                catch
                {
                    throw ex;
                }

                throw new Exception(message, ex);
            }
        }
    }

    public class GetChangeSetsResponse
    {
        public IEnumerable<Changeset> value { get; set; }
        public class Changeset
        {
            public int changesetId { get; set; }
            public string url { get; set; }
            public DateTime createdDate { get; set; }
            public string comment { get; set; }
        }
    }

    public static T Deserialize<T>(string json)
    {
        T data = JsonConvert.DeserializeObject<T>(json);
        return data;
    }
}

Additional References:

C# REST and SOAP (ExtendedClient) api reference

REST API: tfvc Changesets

TfvcChangesetSearchCriteria type @MSDN

Community
  • 1
  • 1
Elmar
  • 1,236
  • 1
  • 11
  • 16
  • thanks for your help. But I found GetChangesetChangesAsync method could only retrieve 100 changeset records once call, maxChangesToConsider param is not work.Even if using skip param for paging query ,only 250 records could be return.I didn't find any seting or api about this limit. – Allen Apr 08 '16 at 08:40
  • 1
    @Allen: I've had to drastically modify my answer in order to accommodate the issue that you identified. Interestingly your issue led to the identification of a serious issue which forces the GetChangesetsAsync to scan $/ which then later leads to the symptom you describe. Review the updated answer for more info. – Elmar Apr 10 '16 at 18:16