2

I'm new to Python and a bit stuck. I have a dataframe of journal articles and their subject headings. The headings were returned from the API in a string where the subheadings modify the descriptor.

For example, one of the subject headings returned from the API is: "Cardiovascular Diseases/*drug therapy/epidemiology"

It describes an article primarily about drug therapy for cardiovascular diseases AND epidemiology for cardiovascular diseases. In this instance, I'd like to create a column in the dataframe for each of these. I'd like the column to include the initial term + the modifier. Some articles have only 1 term without a modifier, some have 1 term + many subheadings.

Current Dataframe:


+-----------------+------+----------------------------------------------------+
|  Article Title  |  ID  |                      Subject                       |
+-----------------+------+----------------------------------------------------+
| an article      |  123 | Cardiovascular Diseases/*drug therapy/epidemiology |
| another article |  324 | Adult                                              |
| One more        |  234 | United Kingdom/epidemiology                        |
+-----------------+------+----------------------------------------------------+

What I want:


+-----------------+------+----------------------------------------------------+--------------------------------------+----------------------------------------+--------------+
|  Article Title  |  ID  |                      Subject                       |              Modifier 1              |                Modifier 2              |   Modifier 3 |
+-----------------+------+----------------------------------------------------+--------------------------------------+----------------------------------------+--------------+
| an article      |  123 | Cardiovascular Diseases/*drug therapy/epidemiology | Cardiovascular diseases/drug therapy | cardiovascular diseases/epidemiology   |              |
| another article |  324 | Adult                                              |  Adult                               |                                        |              |
| One more        |  234 | United Kingdom/epidemiology                        |  United Kingdom/epidemiology         |                                        |              |
+-----------------+------+----------------------------------------------------+--------------------------------------+----------------------------------------+--------------+

My initial attempt was just aiming to separate the initial heading from the modifiers (below). I'm having a hard time wrapping my head doesn't work for multiple subheadings:

for term in df['subjects'] :
    head, sep, tail = term.partition('/')
    descriptor.append(head)
    qualifier.append(tail)
  • Does this answer your question? [How to split a string into a list?](https://stackoverflow.com/questions/743806/how-to-split-a-string-into-a-list) – Tamir Nov 17 '20 at 14:42
  • You could use the `.split(separator)` method. – an4s911 Nov 17 '20 at 14:54

1 Answers1

1

You can use str.split() method with some star-unpacking to separate the title into variables like this:

>>> title = "Cardiovascular Diseases/*drug therapy/epidemiology"

>>> title, *classifiers = title.split('/')

>>> title
'Cardiovascular Diseases'

>>> classifiers
['*drug therapy', 'epidemiology']

The above code splits title by / separator, puts the first element to title variable and all of the rest elements to classifiers list variable.

Granitosaurus
  • 20,530
  • 5
  • 57
  • 82