-1

i have a column with several measures, all of them in one single row, like image below:

enter image description here

I am trying to cast this str columns to float or numeric to be able to perform calculations like mean, plot histgrams and such..

I tried df['nums']=df['nums'].astype(float) and got this error:

enter image description here

how can I fix it?

Thanks

merchmallow
  • 774
  • 3
  • 15
  • `out=df['nums'].str.split(' ',expand=True).astype(float)`?....btw what you are trying to achieve? – Anurag Dabas Jul 06 '21 at 17:22
  • The reason why you get an error is because you cannot just make a long string of many floats all into a float – The shape Jul 06 '21 at 17:22
  • @AnuragDabas i get the result I am looking for but then i get 100 more columns in my DF. Is there a wayt to get all these 100 values in the same column, like vector/array? – merchmallow Jul 06 '21 at 17:24
  • you can wrap them in a container i.e list try: `out=df['nums'].str.split(' ',expand=True).astype(float).values.tolist()` – Anurag Dabas Jul 06 '21 at 17:26
  • @AnuragDabas nice ! they are all in the same column but still as str because I still cant perform calculations, like mean(), for example. – merchmallow Jul 06 '21 at 17:33
  • try `df['newnums'].map(lambda x:np.mean(x))` – Anurag Dabas Jul 06 '21 at 17:42
  • same error as before - cnnot perform reduce with flexible type. Besides mean, i would like to plot a histgram too..i cant find a way to do it.. – merchmallow Jul 06 '21 at 17:46

1 Answers1

1

I don't know if that gives you what you're looking for but I'd use a mixed approach as pandas df are usually seen and used as tables with a value by cell;

So I'd do a twist with a dictionary whose keys would be your index or an "id". you would then be able to perform calculation with that dictionary and link it back to your original dataframe if necessary...

numpy and pandas are required for the code I wrote :

import numpy as np
import pandas as pd

The code to get the dict from the dataframe could look like :

def init_operations_on_data(df):

    df_dct = {}
    list_=df['nums'].str.split(' ',expand=True).astype(float).values.tolist()
    # from Anurag Dabas comments
    for i, v in df.iloc[:,0].iteritems():
        #Here I took the first column of the dataframe df as an "id" but you can replace by index
        df_dct[v] = list_[int(i)]
    print(df_dct)

    return df_dct
    # return a dict object

and to calculate the mean or any other stats you can write small functions calling the function above:


def mean_on_rows(df):

    df_dct = init_operations_on_data(df)
    # dataframe as a dict
    l_mean = []
    for keys in df_dct.keys():
        l_mean.append(np.nanmean(df_dct[keys]))
        # here the link-up between the keys and the id/index in the dataframe is not completely secured and might need to be looked up in more details
    df['mean'] = l_mean
    print(df)

    return df

Having a dict with the list of values (which can be of different length for different rows) would also allow you to draw barplots and histograms easily

see this discussion for example : How to make a histogram from a list of data

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
LaTouwne
  • 144
  • 1
  • 5