0

This is simplified... I have a matrix where sometimes the first column has cells that need to be delimited by ";". When that is the case, I want to copy the information for that row and create a new df with the appended row(s).

I have a key with values key = [444, 555, 777]

I also have a information_df where each item in the key exists, but sometimes within the same cell separated by ";". information_df looks like this.... and the output should look like this.... (both images in same url):

Example of input and ideal output:

enter image description here

Can anyone please advise on how to do this in python?

martineau
  • 119,623
  • 25
  • 170
  • 301

2 Answers2

0

Try splitting the Numbers column values per row.

data = []
for row in df.itertuples():
    for num in str(row.Numbers).split(';'):
        row_data = [getattr(row, column_name) for column_name in df.columns 
                    if column_name != 'Numbers']
        data.append([int(num)] + row_data)
output_df = pd.DataFrame(data, columns=df.columns)
      
chatax
  • 990
  • 3
  • 17
  • When I try to do that I also get the following error... Is this because it isn't recognizing all the values as strings? --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 1 data = [] 2 for row in df.itertuples(): ----> 3 for num in row.Numbers.split(';'): AttributeError: 'int' object has no attribute 'split' – Kyle Reagan Jun 18 '21 at 16:34
  • yes apparently you have integers as well. I edited my answer with a solution for this – chatax Jun 18 '21 at 16:40
  • Thank you! This works great. So if I do this for a data set with more columns, all I have to do is add "row.xyz" in my data.append()? – Kyle Reagan Jun 18 '21 at 16:45
  • yes, but if you don't like the typing a lot of column names you could do something like: `data.append([int(num)] + [getattr(row, column_name) for column_name in df.columns])` – chatax Jun 18 '21 at 16:50
  • Can I do the same thing in the output_df line? – Kyle Reagan Jun 18 '21 at 17:17
  • What thing exactly? – chatax Jun 18 '21 at 17:20
  • It gives me "ValueError: invalid literal for int() with base 10: 'nan'" --> coming from the data.append line. Can you help me get around that? – Kyle Reagan Jun 18 '21 at 17:35
  • It sounds like you have a missing value in your Numbers column – chatax Jun 18 '21 at 17:45
  • Thank you!! Exactly right. Any way I can add code that allows it to treat any blanks just like a string? Or ignore them altogether? – Kyle Reagan Jun 18 '21 at 17:52
  • Tell me your solution and I will add it to my answer – chatax Jun 18 '21 at 17:58
0

This can be best done using the explode function:

df['Numbers'] = df.Numbers.str.split(r'\s*;\s*')
df.explode('Numbers')
  Numbers Description          Date
0     444        Dark      9/2/2022
1     555       Light      9/3/2022
1     777       Light      9/3/2022
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Whenever I do this on my df, it outputs NaN for any value that originally only had one number. For elements with multiple numbers separated by ";" it gives me something like [555, 777] on the same row still – Kyle Reagan Jun 18 '21 at 17:43
  • @KyleReagan Probably your data is different from the one I had. Could you do `df.head().to_dict()` then post the data. – Onyambu Jun 18 '21 at 17:49
  • <-- this is the practice one I'm using. it will output 444 as NaN – Kyle Reagan Jun 18 '21 at 18:39
  • @KyleReagan you do have to share your data, IN TEXT FORMAT. not that you provide an image. Now I gave you a code that you should use to share your data. ie `your_data.to_dict()` then copy paste the results here. IF the data is large, you could use `your_data.head().to_dict()` – Onyambu Jun 18 '21 at 18:43