3

I realize my wording for the title isn't the best, but I hope an example will clear this up.

How would I convert a list like

example_list = ["asdf" , "4", "asdfasdf" , "8" , "9" ,"asdf"]

to a list like

converted_list = ["asdf" , 4, "asdfasdf", 8 , 9 , "asdf"]

So basically how do I make a list where strings that can be converted to integers are converted to integers while strings that cannot be converted remain as strings?

As a side note, how would I afterwards test in a for loop if each item in the converted_list is an integer or not?

The context for this issue is that I am trying to convert headers in pandas to integers if possible, since all the integers are strings as of now. And then if the column had a stringed number as a header, I would take the mean of the column. Right now, I have made all the headers into a list.

ctj232
  • 390
  • 1
  • 9
Matthew
  • 67
  • 1
  • 7

2 Answers2

6

You can use a list comprehension with a ternary to determine whether or not each element of the list is a number.

>>> [int(n) if n.isdigit() else n for n in example_list]
['asdf', 4, 'asdfasdf', 8, 9, 'asdf']
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • 1
    I used a clumsy way of forming the dictionary. But you can use your insight in this dictionary comprehension for use in a `rename` call. `df.rename(columns={k: int(k) for k in df.columns if k.isdigit()})`. I'll likely start using this (-: – piRSquared Aug 01 '17 at 18:01
  • Thanks for the helpful answer. Is there a reason why data_frame_names = list(df.columns.values) #print (data_frame_names) digitize = [int(n) if n.isdigit() else n for n in data_frame_names] #print(digitize) for x in digitize: if isinstance(x,int): print (df[df.columns[x]].mean()) else: pass keeps giving this error TypeError: must be str, not int – Matthew Aug 01 '17 at 18:31
  • Did you first rename your columns? See comment from @piRSquared above. Note that you need to reassign the result, e.g. `df = df.rename(...)`. – Alexander Aug 01 '17 at 18:34
  • When I imported all the csv files I renamed duplicates, but do I need to rename them again? Also what do you mean by reassign the result? Do I need to do something after your method to be able to use the new list in a for loop? Sorry for the needed clarification, I am new to python Edit: I renamed the columns again and I am still getting this error – Matthew Aug 01 '17 at 18:41
  • `df.columns = [int(n) if n.isdigit() else n for n in df]`, then `print(df[x].mean())` – Alexander Aug 01 '17 at 18:45
  • so that works, but for some reason it only gives me the mean of one column(i dont know which) but I need the means of multiple different columns (unique means, not total mean) when i use the for loop now i get 'int' object has no attribute 'isdigit'/////////////////// is there anything wrong with my forloop for finding the columns with digits as headers for x in data_frame_names: if isinstance(x,int): print (df[df.columns[x]].mean()) else: pass – Matthew Aug 01 '17 at 18:52
  • `df[[col for col in df if isinstance(col, int)]].mean()` – Alexander Aug 01 '17 at 19:23
  • Thanks, everything seems to be working fine now – Matthew Aug 01 '17 at 20:01
2

Setup

example_list = ["asdf" , "4", "asdfasdf" , "8" , "9" ,"asdf"]

df = pd.DataFrame(np.arange(24).reshape(4, 6), columns=example_list)

df

   asdf   4  asdfasdf   8   9  asdf
0     0   1         2   3   4     5
1     6   7         8   9  10    11
2    12  13        14  15  16    17
3    18  19        20  21  22    23

Convert Headers

df.rename(columns={k: int(k) for k in df.columns[df.columns.str.isdigit()]})

   asdf   4  asdfasdf   8   9  asdf
0     0   1         2   3   4     5
1     6   7         8   9  10    11
2    12  13        14  15  16    17
3    18  19        20  21  22    23

Note
@Alexander's use of the string method isdigit within a list comprehension is extremely useful. We can improve this answer by combining with his.

df.rename(columns={k: int(k) for k in df.columns if k.isdigit()})

Look at Types

df.rename(
    columns={k: int(k) for k in df.columns[df.columns.str.isdigit()]}
).columns.map(type)

Index([<class 'str'>, <class 'int'>, <class 'str'>, <class 'int'>,
       <class 'int'>, <class 'str'>],
      dtype='object')
piRSquared
  • 285,575
  • 57
  • 475
  • 624