Some elements in a list have delimiter and need to be separated. Then the other values for that element need to be copied to new row. Advice?

Question

This is simplified... I have a matrix where sometimes the first column has cells that need to be delimited by ";". When that is the case, I want to copy the information for that row and create a new df with the appended row(s).

I have a key with values key = [444, 555, 777]

I also have a information_df where each item in the key exists, but sometimes within the same cell separated by ";". information_df looks like this.... and the output should look like this.... (both images in same url):

Example of input and ideal output:

Can anyone please advise on how to do this in python?

The term you're looking for might be *unnest*: https://stackoverflow.com/q/53218931/1424875. — nanofarad, Jun 18 '21 at 16:30

chatax · Accepted Answer · 2021-06-18T16:52:47.627

0

Try splitting the Numbers column values per row.

data = []
for row in df.itertuples():
    for num in str(row.Numbers).split(';'):
        row_data = [getattr(row, column_name) for column_name in df.columns 
                    if column_name != 'Numbers']
        data.append([int(num)] + row_data)
output_df = pd.DataFrame(data, columns=df.columns)

edited Jun 18 '21 at 16:52

answered Jun 18 '21 at 16:29

chatax

990
3
17

When I try to do that I also get the following error... Is this because it isn't recognizing all the values as strings? --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 1 data = [] 2 for row in df.itertuples(): ----> 3 for num in row.Numbers.split(';'): AttributeError: 'int' object has no attribute 'split' – Kyle Reagan Jun 18 '21 at 16:34
yes apparently you have integers as well. I edited my answer with a solution for this – chatax Jun 18 '21 at 16:40
Thank you! This works great. So if I do this for a data set with more columns, all I have to do is add "row.xyz" in my data.append()? – Kyle Reagan Jun 18 '21 at 16:45
yes, but if you don't like the typing a lot of column names you could do something like: `data.append([int(num)] + [getattr(row, column_name) for column_name in df.columns])` – chatax Jun 18 '21 at 16:50
Can I do the same thing in the output_df line? – Kyle Reagan Jun 18 '21 at 17:17
What thing exactly? – chatax Jun 18 '21 at 17:20
It gives me "ValueError: invalid literal for int() with base 10: 'nan'" --> coming from the data.append line. Can you help me get around that? – Kyle Reagan Jun 18 '21 at 17:35
It sounds like you have a missing value in your Numbers column – chatax Jun 18 '21 at 17:45
Thank you!! Exactly right. Any way I can add code that allows it to treat any blanks just like a string? Or ignore them altogether? – Kyle Reagan Jun 18 '21 at 17:52
Tell me your solution and I will add it to my answer – chatax Jun 18 '21 at 17:58

score 0 · Answer 2 · answered Jun 18 '21 at 17:25

0

This can be best done using the explode function:

df['Numbers'] = df.Numbers.str.split(r'\s*;\s*')
df.explode('Numbers')
  Numbers Description          Date
0     444        Dark      9/2/2022
1     555       Light      9/3/2022
1     777       Light      9/3/2022

answered Jun 18 '21 at 17:25

Onyambu

67,392
3
24
53

Whenever I do this on my df, it outputs NaN for any value that originally only had one number. For elements with multiple numbers separated by ";" it gives me something like [555, 777] on the same row still – Kyle Reagan Jun 18 '21 at 17:43
@KyleReagan Probably your data is different from the one I had. Could you do `df.head().to_dict()` then post the data. – Onyambu Jun 18 '21 at 17:49
<-- this is the practice one I'm using. it will output 444 as NaN – Kyle Reagan Jun 18 '21 at 18:39
@KyleReagan you do have to share your data, IN TEXT FORMAT. not that you provide an image. Now I gave you a code that you should use to share your data. ie `your_data.to_dict()` then copy paste the results here. IF the data is large, you could use `your_data.head().to_dict()` – Onyambu Jun 18 '21 at 18:43

Some elements in a list have delimiter and need to be separated. Then the other values for that element need to be copied to new row. Advice?

2 Answers2