6

I would like to enumerate the strings that are in the string intern pool.

That is to say, I want to get the list of all the instances s of string such that:

string.IsInterned(s) != null

Does anyone know if it's possible?

weston
  • 54,145
  • 21
  • 145
  • 203
Benoit Blanchon
  • 13,364
  • 4
  • 73
  • 81
  • Curious: Why do you like to do that ? – Sriram Sakthivel Mar 04 '14 at 12:40
  • 1
    Both research and fun :-) – Benoit Blanchon Mar 04 '14 at 12:41
  • 4
    Possibly related, though I don't think there's a direct answer specifically to your question (but considering this is for "research and fun", there's a lot of info): http://stackoverflow.com/questions/2328745/how-do-i-view-net-interned-strings – Chris Sinclair Mar 04 '14 at 12:43
  • 1
    Hmm I guess it could be possible via the profiling API (http://msdn.microsoft.com/en-us/library/bb384493(v=vs.110).aspx). However, someone more knowledgeable in this field should provide a detailed answer. – Ondrej Tucny Mar 04 '14 at 12:55
  • 3
    So no, .NET does not provide access to the hashtable. It's hidden in internal calls to c++ files in the [SSCLI](http://en.wikipedia.org/wiki/Shared_Source_Common_Language_Infrastructure). And it's only an implementation detail which could change whenever MS wants. I assume that this is also the reason why it's not exposed. – Tim Schmelter Mar 04 '14 at 12:57
  • 1
    @TimSchmelter The [callsite](http://referencesource-beta.microsoft.com/#mscorlib/system/appdomain.cs#e46d5e372c4095ab) corroborates this. – Dustin Kingen Mar 04 '14 at 12:59
  • @TimSchmelter What is you walk all objects on the heap using the profiling API, select strings, and call `IsInterned`? – Ondrej Tucny Mar 04 '14 at 13:10
  • @OndrejTucny: i don't know, i have never used the profiling API before. However, i think that even if that could work you would indirectly modify the intern-pool by tracking the objects. You could f.e. prevent the garbage collector from removing strings from the pool, hence you'd impact the results. – Tim Schmelter Mar 04 '14 at 13:14
  • This should prove interesting... http://stackoverflow.com/questions/555871/c-strings-with-same-contents – Paul Zahra Mar 04 '14 at 13:27
  • @PaulZahra: which proves my argument wrong that tracking the intern-pool could prevent them from being garbage-collected (J.Skeet says that the intern-pool is not garbage collected as long as the app-domain lives). – Tim Schmelter Mar 04 '14 at 13:35
  • @TimSchmelter I don't think that is strictly correct, they can live longer than that!... "First, the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates." - Taken from http://msdn.microsoft.com/en-us/library/system.string.intern.aspx – Paul Zahra Mar 04 '14 at 13:37
  • 1
    You are 99% there by enumerating the strings in the assembly metadata, IMetaDataImport::EnumUserStrings(). – Hans Passant Mar 04 '14 at 16:16
  • @HansPassant, thanks I'll dig in that direction – Benoit Blanchon Mar 04 '14 at 16:32

2 Answers2

2

Thanks to the advice of @HansPassant, I managed to get the list of string literals in an assembly. Which is extremely close to what I originally wanted.

You need to use read assembly meta-data, and enumerate user-strings. This can be done with these three methods of IMetaDataImport:

[ComImport, Guid("7DAC8207-D3AE-4C75-9B67-92801A497D44")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface IMetaDataImport
{
    void CloseEnum(IntPtr hEnum);

    uint GetUserString(uint stk, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] char[] szString, uint cchString, out uint pchString);

    uint EnumUserStrings(ref IntPtr phEnum, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)]uint[] rStrings, uint cmax, out uint pcStrings);

    // interface also contains 62 irrelevant methods
}

To get the instance of IMetaDataImport, you need to get a IMetaDataDispenser:

[ComImport, Guid("809C652E-7396-11D2-9771-00A0C9B4D50C")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[CoClass(typeof(CorMetaDataDispenser))]
interface IMetaDataDispenser
{
    uint OpenScope([MarshalAs(UnmanagedType.LPWStr)]string szScope, uint dwOpenFlags, ref Guid riid, [MarshalAs(UnmanagedType.Interface)] out object ppIUnk);

    // interface also contains 2 irrelevant methods
}

[ComImport, Guid("E5CB7A31-7512-11D2-89CE-0080C792E5D8")]
class CorMetaDataDispenser
{
}

Here is how it goes:

var dispenser = new IMetaDataDispenser();
var metaDataImportGuid = new Guid("7DAC8207-D3AE-4C75-9B67-92801A497D44");

object scope;
var hr = dispenser.OpenScope(location, 0, ref metaDataImportGuid, out scope);

metaDataImport = (IMetaDataImport)scope;    

where location is the path to the assembly file.

After that, calling EnumUserStrings() and GetUserString() is straighforward.

Here is a blog post with more detail, and a demo project on GitHub.

Benoit Blanchon
  • 13,364
  • 4
  • 73
  • 81
1

The SSCLI function that its pointing to is

STRINGREF*AppDomainStringLiteralMap::GetStringLiteral(EEStringData *pStringData) 
{ 
    ... 
    DWORD dwHash = m_StringToEntryHashTable->GetHash(pStringData);
    if (m_StringToEntryHashTable->GetValue(pStringData, &Data, dwHash))
    {
        STRINGREF *pStrObj = NULL;
        pStrObj = ((StringLiteralEntry*)Data)->GetStringObject();
        _ASSERTE(!bAddIfNotFound || pStrObj);
        return pStrObj;
    }
    else { ... }

    return NULL; //Here, if this returns, the string is not interned
}

If you manage to find the native address of m_StringToEntryHashTable, you can enumerate the strings that exist.

fiinix
  • 142
  • 3