1

Hi I have a prboblem to convert list of objects to a list of integers. The objects are within the "stopsequence" column of the Pandas data frame "Kanten". All of this I receive after so CSV importing and data cleaning in the column. I am using Python 3.X

I am a Python newbie, maybe that's part of the problem here.

import pandas as pd
import numpy as np
import os
import re
import ast
orgn_csv = pd.read_csv(r"Placeholder path for csv file")
df = orgn_csv.dropna()
Kanten = pd.DataFrame({"stopsequence" : df.stopsequence})

# In between is a block in which I use regular expressions for data cleaning purposes.
# I left the data cleaning block out to make the post shorter


Kanten.stopsequence = Kanten.stopsequence.str.split (',')
print (Kanten.head())
print (Kanten.stopsequence.dtype)                      

This gives the following output:

                                        stopsequence
2  [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
3  [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
4  [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
5  [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
6  [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
object

I am looking for a way to transform the list which contains objects. I searched through the StackOverflow Forum intensively and tried a bunch of different approaches. With none of them I was succesfull. I tryed to use:

astype(str).astype(int)

Kanten.stopsequence = Kanten.stopsequence.astype(str).astype(int)
This Returns:
ValueError: invalid literal for int() with base 10:

adapted the following post with the use of atoi instead of atof

Kanten.stopsequence.applymap(atoi)
This Returns:
AttributeError: 'Series' object has no attribute 'applymap'

list(map())

Kanten.stopsequence = list(map(int, Kanten.stopsequence))
This returns:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

apply(ast.literal_eval)

Kanten.stopsequence = Kanten.stopsequence.apply(ast.literal_eval)
This returns:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

Does anybody see a solution for that? I am uncertain if it's a complicated case or I just lacke some further programming experience. If possible a short explanation would be helpful. That I can find a solution myself againg. Thank you in advance.

Max
  • 21
  • 1
  • 6
  • A sample of `stopsequence`? – DirtyBit Mar 13 '19 at 14:46
  • You say you import this data from a CSV file. Why are you not creating the ```DataFrame``` directly from the CSV? – emporerblk Mar 13 '19 at 14:57
  • @DirtyBit Do you mean a sample of the original values which I loaded in from the CSV file? – Max Mar 13 '19 at 14:58
  • @emporerblk I added the `read_csv` as well to make it more clear what I am doing. I am kind of learning while I am doing it. So it is definitely possible that I am doing things which are not 'ideal'. – Max Mar 13 '19 at 15:11

3 Answers3

0

A pandas Series can be trivially converted to a list, and a list of lists can be given as input to create a DataFrame.

I think this could help:

splitted = pd.DataFrame(Kanten.stopsequence.str.split (','), index=Kanten.index).astype(int)

This gives you a new dataframe with same index as the original one but where each element is in its own column.

If relevant, you could then concat that new columns

pd.concat([Kanten, splitted], axis=1)
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

So from your second attempt at manipulating the data, your error message tells you that Kanten.stopsequence is a Series, not a DataFrame. To convert, you'd need to access

list_of_lists = Kanten.stopsequence.to_numpy(dtype='int32').tolist()

Note that for your data this will create a nested 2d data array. To access the first integer from the first row, you would need to write list_of_lists[0][0].

emporerblk
  • 1,063
  • 5
  • 20
0

This is how I would approach pulling the last column of a DataFrame into a list of ints.

Let's say the .csv is located in the same directory as your .py script and it's called kanten.csv. The column you're looking for is stopsequence.

import os
import pandas as pd

path=os.getcwd()
filename = 'kanten.csv'
filepath = os.path.join(path, filename)

kanten = pd.read_csv(filepath)
list = list(kanten['stopsequence'].apply(lambda x: int(x)))

In the last line, the stopsequence column is pulled from kanten, the values are casted as integers, then the column is converted to a standard python list object.

Ethan King
  • 151
  • 6