0

I'm trying to access a column in a pandas dataframe in python.

Here is the dataframe I am using. I only printed the first row in order to try to keep it a manageable size. I would have selected a smaller number of columns but you will see that this is my problem and the reason I'm asking this question.

{'OilSampleID': {0: 'TGSM3059'}, 'Type': {0: 'Oil Seep'}, 'Country': {0: 'Mexico'}, 'USGS Province': {0: 'Ocean'}, 'State / Province': {0: 'GOM'}, 'County / Block': {0: nan}, 'Block': {0: 'TGS Area 2'}, 'Section': {0: '11'}, 'Well': {0: 'VER115'}, 'Upper Depth (ft)': {0: nan}, 'Lower Depth (ft)': {0: nan}, 'Standardized Reservoir Age': {0: nan}, 'Reservoir Age': {0: nan}, 'Standardized Reservoir Formation': {0: nan}, 'Reservoir Formation': {0: nan}, 'Lithology': {0: nan}, 'Operator': {0: 'TDI-Brooks'}, 'Location': {0: nan}, 'Latitude': {0: 21.712965}, 'Longitude': {0: -96.280545}, 'Datum': {0: 'WGS84 UTM15N'}, 'API / UWI Well #': {0: nan}, 'Comments': {0: 'Piston Core Sediment'}, 'SampleID': {0: nan}, 'ClientID': {0: nan}, 'API Gravity': {0: nan}, '<C15': {0: nan}, '% S': {0: nan}, 'ppm Ni': {0: nan}, 'ppm V': {0: nan}, '% Sat': {0: 10.3448275862249}, '% Aro': {0: 13.7931034482896}, '% NSO': {0: 48.2758620689676}, '% Asph': {0: 27.5862068965179}, 'Sat / Aro': {0: 0.75}, '13Cs': {0: nan}, '13Ca': {0: -27.51}, '13Cwc': {0: nan}, '34Swc': {0: nan}, 'CV': {0: nan}, 'EOM': {0: 193.0}, 'Misc': {0: 'TSF = 150,000'}, 'Misc.1': {0: 150000.0}, 'SRType': {0: 'Marine Carbonate/Marl'}, 'SRTconf': {0: 'suspected'}, 'SRAge': {0: 'UJ/LK'}, 'SRAconf': {0: 'suspected'}, 'SRM': {0: nan}, 'BIOD': {0: 'Mild 4'}, 'BIOD_': {0: 4.0}, 'GM REF': {0: nan}, 'GM Fam': {0: nan}, 'C19/C23': {0: 0.018}, 'C22/C21': {0: 1.176}, 'C24/C23': {0: 0.405}, 'C26/C25': {0: 0.672}, 'Tet/C23': {0: 0.471}, 'C27T/C27': {0: 0.015}, 'C28/H': {0: 0.029}, 'C29/H': {0: 0.988}, 'X/H': {0: 0.013}, 'OL/H': {0: 0.009}, 'C31R/H': {0: 0.408}, 'GA/C31R': {0: 0.06}, 'C35S/C34S': {0: 1.131}, 'S/H': {0: 0.242}, '% C27': {0: 23.975}, '% C28': {0: 28.596}, '% C29': {0: 47.429}, 'S1/S6': {0: 0.887}, 'C29 20S/R': {0: 0.63}, 'C29 bbS/aaR': {0: 1.06}, 'C27 Ts/Tm': {0: 0.382}, 'C29 Ts/Tm': {0: 0.1}, 'DM/H': {0: 0.012}, 'C26(S+R) / Ts': {0: 0.316}, 'P': {0: 8.98723829264705}, '3MP': {0: 4.01903139633122}, '2MP': {0: 5.318803252166}, '9MP': {0: 5.38721229720994}, '1MP': {0: 3.85655991435188}, 'C4N': {0: 0.43610766215509}, 'DBT': {0: 0.0}, '4MDBT': {0: 0.256533918914759}, '32MDBT': {0: nan}, '1MDBT': {0: nan}, 'C20TAS': {0: 1.65036821168495}, 'C21TAS': {0: 2.32590753149381}, 'C26STAS': {0: 12.6898778556501}, 'C26RC27STAS': {0: 62.243679859351}, 'C28STAS': {0: 52.8801918189623}, 'C27RTAS': {0: 42.2254830533693}, 'C28RTAS': {0: 46.860195855096}, '#9b': {0: 8.0}, '#3e': {0: 11.0}, 'C18': {0: 0.0}, 'C19': {0: 1.0}, 'C20': {0: 1.0}, 'MPI': {0: 0.768292682926829}, 'F1': {0: 0.50253106304648}, 'F2': {0: 0.286240220892775}, 'P/DBT': {0: nan}, 'DBT/C4N': {0: nan}, 'MDR': {0: nan}, 'TAS1': {0: 0.0376145000974469}, 'TAS2': {0: 0.0472878998609179}, 'TAS3(CR)': {0: 0.0180023228803717}, 'TAS4': {0: 0.239974126778784}, 'TAS5': {0: 0.901094890510949}, 'DINO3/9': {0: 1.3740932642487}, 'C19T': {0: 0.181692648715433}, 'C20T': {0: 0.622946224167199}, 'C21T': {0: 1.62225579210208}, 'C22T': {0: 1.90777281151205}, 'C23T': {0: 9.92820544766473}, 'C24T': {0: 4.02319436441316}, 'C25S': {0: 2.336048340627}, 'C25R': {0: 2.336048340627}, 'TET': {0: 4.67209668125399}, 'C26S': {0: 1.60927774576526}, 'C26R': {0: 1.53140946774436}, 'ET_C28S': {0: 1.03824370694533}, 'ET_C28R': {0: 1.23291440199758}, 'ET_C29S': {0: 2.01159718220658}, 'ET_C29R': {0: 3.02388479647828}, 'ET_C30S': {0: 1.36269486536575}, 'ET_C30R': {0: 5.84012085156749}, 'ET_C31S': {0: 1.14206807763986}, 'ET_C31R': {0: 0.584012085156749}, 'Ts': {0: 9.92820544766473}, 'C27T': {0: 0.545077946146299}, 'Tm': {0: 26.0209829053174}, 'C28DM': {0: 2.40093857231108}, 'C28H': {0: 2.79027996241558}, 'C29DM': {0: 1.12909003130305}, 'C29H': {0: 96.2451916338322}, 'C29D': {0: 9.60375428924431}, 'C30X': {0: 1.29780463368166}, 'OL': {0: 0.882507150903532}, 'C30H': {0: 97.4391718968193}, 'C30M': {0: 8.70826909200397}, 'C31S': {0: 56.6751283528783}, 'C31R': {0: 39.7387778833326}, 'GA': {0: 2.38796052597426}, 'C32S': {0: 27.461546048704}, 'C32R': {0: 18.0135283155015}, 'C33S': {0: 20.6610497682121}, 'C33R': {0: 12.6146610393858}, 'C34S': {0: 12.458924483344}, 'C34R': {0: 7.7219375704059}, 'C35S': {0: 14.0941583217829}, 'C35R': {0: 8.53955448962535}, 'S1': {0: 7.30404447836041}, 'S2': {0: 4.12442312584033}, 'S3': {0: 4.85119372070206}, 'S4': {0: 11.337621279843}, 'S4B': {0: 10.4109887713943}, 'S5': {0: 6.32290417529707}, 'S5B': {0: 7.54024492169047}, 'S6': {0: 8.23067698680911}, 'S7': {0: 2.54369708201606}, 'S8': {0: 3.83371488789564}, 'S9': {0: 6.23205785093935}, 'S9B': {0: 8.394200370653}, 'S10': {0: 7.19502888913115}, 'S10B': {0: 8.99378611141393}, 'S11': {0: 3.67019150405175}, 'S12': {0: 7.4675678622043}, 'S13': {0: 14.0085032159599}, 'S13B': {0: 16.0071223518296}, 'S14': {0: 12.4641157018787}, 'S14B': {0: 14.916966459537}, 'ISTD': {0: 500.0}, 'S15': {0: 11.7918529016316}, 'Biodegraded': {0: nan}, '15': {0: nan}, '16': {0: nan}, '17': {0: nan}, 'Pr': {0: nan}, '18': {0: nan}, 'Ph': {0: nan}, '19': {0: nan}, '20': {0: nan}, '21': {0: nan}, '22': {0: nan}, '23': {0: nan}, '24': {0: nan}, '25': {0: nan}, '26': {0: nan}, '27': {0: nan}, '28': {0: nan}, '29': {0: nan}, '30': {0: nan}, '31': {0: nan}, '32': {0: nan}, '33': {0: nan}, '34': {0: nan}, '35': {0: nan}, 'Pr/Ph': {0: nan}, 'Pr/nC17': {0: nan}, 'Ph/nC18': {0: nan}, 'nC27/nC17': {0: nan}, 'nC19*2/(nC18+nC20)': {0: nan}, 'CPI': {0: nan}, 'S1/S6.1': {0: 0.887}, 'DWW D/R': {0: 1.1739236190043156}, 'DWW Ts/(Ts+Tm)': {0: 0.27617328519855566}, 'DWW 35SH/34SH': {0: 1.131}, 'DWW 29H/H': {0: 0.988}, 'DWW OL/H': {0: 0.009}, 'DWW GA/H': {0: 0.02450719232818326}, 'DWW 31H/H': {0: 0.9894778902504008}, 'DWW 29H/31H': {0: 0.9982501009557132}, 'DWW %27 St': {0: 23.975}, 'DWW %28 St': {0: 28.596}, 'DWW %29 St': {0: 47.429}, 'DWW M/H': {0: 0.08937133724027711}, 'DWW St/Ho': {0: 0.242}, 'DWW 29S/R': {0: 0.63}, 'Family': {0: 2}}

I looked on this StackOverflow post to try and find the answer, but I think I'm running into problems because my data has headers with spaces with them.

I followed the other example to try and select just some of the columns. Here is my code to try and select only a few.

geology_data_selection = geology_data[['OilSampleID', 'Type', 'Country', 'USGS Province', 'Well', 'Latitude', 'Longitude', 'EOM', 'Misc...43', 'BIOD', '% Sat', '% Aro', '% NSO', '% Asph']]

I was thinking the problem is that some of these columns like the % Sat column has a space in it. One of the comments under that SO post told me to use backpacks, so I tried using backpacks but that didn't work either.

geology_data_selection = geology_data[['OilSampleID', 'Type', 'Country', '`USGS Province`', 'Well', 'Latitude', 'Longitude', 'EOM', 'Misc...43', 'BIOD', '`% Sat`', '`% Aro`', '`% NSO`', '`% Asph`']]

How can I select columns from a pandas data frame explicitly when the column names are all that great (some of them with the ... and some of them with the spaces).

hachiko
  • 671
  • 7
  • 20
  • 2
    What is the issue? You have to describe the problem you are having, we can't guess. The first approach should work assuming that the DataFrame contains all those columns. It doesn't work with the example given because your DataFrame doesn't contain a column named `'Misc...43'`. That is what the error `KeyError: "['Misc...43'] not in index"` tells you. – Rodalm May 11 '22 at 21:33
  • I realize this wasn't the best question on SO. Somehow, I didn't see what looks like a totally obvious error message. Maybe because I'm using python in RStudio? I don't know. Thanks for the response, though. Is there any different between the comma and a backtick in this case? 'Misc...43' and `Misc...43` – hachiko May 12 '22 at 00:17
  • I do not know where you saw the backticks being used... But it won't work (unless the column labels contain backticks) because the backticks are matched literally. – Rodalm May 12 '22 at 11:15

1 Answers1

0

If you supply keys that exist... this works just fine.

df[['OilSampleID', 'Type', 'Country', 'USGS Province', 'Well', 'Latitude', 'Longitude', 'EOM', 'BIOD', '% Sat', '% Aro', '% NSO', '% Asph']]

Output:

  OilSampleID      Type Country USGS Province    Well   Latitude  Longitude    EOM    BIOD      % Sat      % Aro      % NSO     % Asph
0    TGSM3059  Oil Seep  Mexico         Ocean  VER115  21.712965 -96.280545  193.0  Mild 4  10.344828  13.793103  48.275862  27.586207
BeRT2me
  • 12,699
  • 2
  • 13
  • 31