0

I have created the following dataframe from a csv file:

id      marks
5155    1,2,3,,,,,,,,
2156    8,12,34,10,4,3,2,5,0,9
3557    9,,,,,,,,,,
7886    0,7,56,4,34,3,22,4,,,
3689    2,8,,,,,,,,

It is indexed on id. The values for the marks column are string. I need to convert them to a list of numbers so that I can iterate over them and use them as index number for another dataframe. How can I convert them from string to a list? I tried to add a new column and convert them based on "Add a columns in DataFrame based on other column" but it failed:

df = df.assign(new_col_arr=lambda x: np.fromstring(x['marks'].values[0], sep=',').astype(int))
Birish
  • 5,514
  • 5
  • 32
  • 51
  • Where is this data coming from? This looks like a poor fit for CSV, and for a Pandas DataFrame. – AMC Jan 26 '20 at 20:20
  • Do you want a list, an array, or an ndarray? The title says ndarray, your post says array, the code in your post makes an ndarray, what you've shared in a comment as your expect output is a list, and the accepted answer is a list. – AMC Jan 26 '20 at 20:22
  • @AMC you're right. I was interested in to have a list. I'll edit the question now! – Birish Jan 26 '20 at 20:32
  • What about the matter of the data? I think that's far more important than anything else here. – AMC Jan 26 '20 at 20:37
  • @AMC it's from a csv file generated by a script. I don't have access to it, just its output file ¯\_(ツ)_/¯ – Birish Jan 26 '20 at 20:43
  • Yikes, good luck. – AMC Jan 26 '20 at 20:46
  • I forgot to ask: Do you need to keep the entire thing as a DataFrame? What are you using it for? If the values in the `id` column are actually representative of your data, a simply `id` -> `marks` dictionary should suffice. – AMC Jan 27 '20 at 00:56
  • Related, possible duplicate of: https://stackoverflow.com/q/7844118/11301900 – AMC Jan 27 '20 at 01:04

3 Answers3

0

Here's a way to do:

df = df.assign(new_col_arr=df['marks'].str.split(','))

# convert to int
df['new_col'] = df['new_col_arr'].apply(lambda x: list(map(int, [i for i in x if i != ''])))
Birish
  • 5,514
  • 5
  • 32
  • 51
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • I just tried your solution. The new column will be an array of strings: `['50', '51', '57', '', '', '', '', '', '', '']` . I need it to be an array of integers like: `[50, 51, 57, , , , , , ,']` – Birish Jan 26 '20 at 19:21
  • @Birish Do you want a list, an array, or an ndarray? The title says ndarray, your post says array, the code in your post makes an ndarray, what you've just shared is a list, and the accepted answer is a list. – AMC Jan 26 '20 at 20:21
  • `df['new_col_arr'].apply(lambda x: list(map(int, [i for i in x if i != ''])))` ? Wouldn't `df['new_col_arr'].map(lambda x: [int(num_str) for num_str in x if num_str])` work? – AMC Jan 27 '20 at 00:54
  • not clear, why is the answer downvoted after the OP has accepted it? – YOLO Jan 27 '20 at 06:12
  • @AMC yes, it would work. just different ways of writing the same action. – YOLO Jan 27 '20 at 06:12
0

I presume that you want to create NEW dataframe, since the number of items is differnet from number of rows. I suggest the following:

#source data
df = pd.DataFrame({'id':[5155, 2156, 7886], 
                   'marks':['1,2,3,,,,,,,,','8,12,34,10,4,3,2,5,0,9', '0,7,56,4,34,3,22,4,,,']

# create dictionary from df:
dd = {row[0]:np.fromstring(row[1], dtype=int, sep=',') for _, row in df.iterrows()}

{5155: array([1, 2, 3]),
 2156: array([ 8, 12, 34, 10,  4,  3,  2,  5,  0,  9]),
 7886: array([ 0,  7, 56,  4, 34,  3, 22,  4])}

# here you pad the lists inside dictionary so that they have equal length
...

# convert dd to DataFrame:
df2 = pd.DataFrame(dd)
Poe Dator
  • 4,535
  • 2
  • 14
  • 35
0

I found two similar alternatives:

1.

df['marks'] = df['marks'].str.split(',').map(lambda num_str_list: [int(num_str) for num_str in num_str_list if num_str])

2.

df['marks'] = df['marks'].map(lambda arr_str: [int(num_str) for num_str in arr_str.split(',') if num_str])
AMC
  • 2,642
  • 7
  • 13
  • 35