0

I have a DataFrame of containing string objects which present like integers, datetimes, and floats.

The appearance of my DataFrame:

   A    B      C        D        E.....................φ
1-Int  NaN  Str Obj  Datetime   NaN...............Mixed Obj
2-NaN Float Str Obj  Datetime Category................NaN
3-Int Float   NaN    Datetime Category............Mixed Obj
.  .   .       .         .       .                     .
.  .   .       .         .       .                     .
.  .   .       .         .       .                     .
Z-Int Float Str Obj     NaN   Category............Mixed Obj

The actual contents and structure of it:

   A         B       C        D         E.....................φ
1-Str Obj   NaN    Str Obj  Str Obj   Str Obj............Mixed Obj
2-  NaN    Str Obj Str Obj  Str Obj   Str Obj................NaN
3-Str Obj  Str Obj  NaN     Str Obj   Str Obj............Mixed Obj
.    .        .       .        .         .                    .
.    .        .       .        .         .                    .
.    .        .       .        .         .                    .
Z-Str Obj  Str Obj Str Obj    NaN     Str Obj............Mixed Obj

I attempted to access the string objects to see if I could change them:

df = df.select_dtypes(includes='object').where(~(r'\d+\\\\\d+\\\\\d+'), datetime)

I wanted to see if I could detect the datetime strings and convert the string values to datetime values. I was unsuccessful at doing this because the where method does not accept strings as conditions. How can I detect datetime, ints, or floats contained inside strings and change them from string objects into their proper type?

  • Does this help? https://stackoverflow.com/questions/36462257/create-empty-dataframe-in-pandas-specifying-column-types – sehan2 Aug 01 '21 at 13:47
  • @sehan2 I do not have an empty dataframe. The DataFrame has been read in from a CSV file. I also see a lot of solutions which hardcode in the values of the column. I wish to recognize patterns inside of strings and then convert the data type or perform operations on the column accordingly. One or two of the rows from my dataset have been properly read in, the rest have not been. I wish to rectify the ones which haven't been read in properly. –  Aug 01 '21 at 13:59
  • Does this help? https://stackoverflow.com/questions/15891038/change-column-type-in-pandas – sehan2 Aug 01 '21 at 14:01
  • @sehan2 I appreciate this, but how do I recognize the contents of the string itself and if the strings fit a certain form, convert them? I have the conversion part thanks to the links, but what about the recognition component? These strings have formats which contain integers, dates, or floats. How can I recognize them? –  Aug 01 '21 at 14:25
  • https://docs.python.org/3/library/re.html – sehan2 Aug 01 '21 at 14:27
  • Okay, but I can't pass a string. Re methods are for string objects, plus it requires (pattern, object). How would you pass a `df.loc[:,col]` obj inside of a re method? It does not seem logical. –  Aug 01 '21 at 14:33
  • well you want to check if a string fits a specific type. Therefore go through each entry of your dataframe. – sehan2 Aug 01 '21 at 14:51

0 Answers0