0

As part of a program that reads pandas data frame. One of these columns contains many values separate by : in the same column. To know what these values means, there is another column that says what each value is.

I want to split these values and put them in new columns the problem is that not all input in my programs receive exactly the same type of data frame and the order or new values can appear.

With an example is easier to explain:

df1

Column1     Column2
GT:AV:AD    0.1:123:23
GT:AV:AD    0.2:456:24


df2

Column1     Column2
GT:AD:AV    0.4:23:123
GT:AD:AV    0.5:12:323

Before being awera of this issue what I did to split this data and put them in new columns was something like this:

file_data["GT"] = file_data[name_sample].apply(lambda x: x.split(":")[1]) 
file_data["AD"] = file_data[name_sample].apply(lambda x: x.split(":")[2])

If what I want is GT and AD (if there are in the input data frame) how can I do this in a more secure way?

2 Answers2

1
import pandas as pd
df = pd.DataFrame({"col1":["GT:AV:AD","GT:AD:AV"],"col2":["0.1:123:23","0.4:23:123"]})
df["keyvalue"] = df.apply(lambda x:dict(zip(x.col1.split(":"),x.col2.split(":"))), axis=1)
print(df)

output

       col1        col2                                keyvalue
0  GT:AV:AD  0.1:123:23  {'GT': '0.1', 'AV': '123', 'AD': '23'}
1  GT:AD:AV  0.4:23:123  {'GT': '0.4', 'AD': '23', 'AV': '123'}

Explanation: I create column keyvalue holding keys (from col1) and values (from col2), using dict(zip(keys_list, values_list)) construct, as dicts. apply with axis=1 apply function to each row, lambda is used in python for creating nameless function. If you wish to have rather pandas.DataFrame than column with dicts, you might do

df2 = df.apply(lambda x:dict(zip(x.col1.split(":"),x.col2.split(":"))), axis=1).apply(pd.Series)
print(df2)

output

    GT   AV  AD
0  0.1  123  23
1  0.4  123  23
Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Many thanks Daewo I have ensure that your approach work. But in my case I don't know how to apply this because every iteration my col2 change the name and I don't know how to do this – Manolo Dominguez Becerra Oct 07 '21 at 12:33
0

have a look at this answer:

keys = ['a', 'b', 'c']
values = [1, 2, 3]
dictionary = dict(zip(keys, values))
print(dictionary) # {'a': 1, 'b': 2, 'c': 3}

you need to split your column 1 to array (keys) and column 2 to values.

this way you will have dictionary["GT"] etc.

Koko Jumbo
  • 313
  • 1
  • 5