-1

I have excel file and import to dataframe. I want to extract inside of column to several columns.

Here is original

enter image description here

After importing to pandas in python, I get this data with '\n'

enter image description here

So, I want to extract inside of column. Could you all share idea or code?

My expected columns are....

enter image description here

Bella
  • 43
  • 4
  • 1
    [Please do not upload images of code/errors when asking a question](https://meta.stackoverflow.com/q/285551). – Bill Huang Nov 24 '20 at 16:44
  • @BillHuang It isn't error images. It is the real data – Bella Nov 24 '20 at 17:06
  • PLEASE READ THE LINK FOR THE REASON WHY. Top one of them: "Code or sample data in images can't be copied and pasted into an editor and compiled in order to reproduce the problem." – Bill Huang Nov 24 '20 at 17:10
  • * Please provide sample data in a [reproducible way](https://stackoverflow.com/questions/20109391). Otherwise people won't be able to test. * If your data contains anomalies/corner cases that were not shown in your sample data, please also explain. * Please show what you have attempted to solve the problem. StackOverflow is not a code-writing service. – Bill Huang Nov 24 '20 at 17:14

1 Answers1

1

Don't worry no one is born knowing everything about SO. Considering the data you gave, specially that 'Vector:...' is not separated by '\n', the following works:

import pandas as pd
import numpy as np

data = pd.read_excel("the_data.xlsx")
ok = []
l = len(data['Details'])
for n in range(l):
    x = data['Details'][n].split()
    x[2] = x[2].lstrip('Vector:')
    x = [v for v in x if v not in ['Type:', 'Mission:']]
    ok += x
values = np.array(ok).reshape(l, 3)
df = pd.DataFrame(values, columns=['Type', 'Vector', 'Mission'])
data.drop('Details', axis=1, inplace=True)
final = pd.concat([data, df], axis=1)

The process goes like this:
First you split all elements of the Details columns as a list of strings. Second you deal with the 'Vector:....' special case and filter column names. Third you store all the values in a list which will inturn be converted to a numpy array with shape (length, 3). Finally you drop the old 'Details' column and perform a concatenation with the df created from splited strings.
You may want to try a more efficient way to transform your data when reading by trying to use this ideas inside the pd.read_excel method using converters

Ivan Calderon
  • 580
  • 6
  • 14